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Abstract 

Computing  the  singular  values  of  a  bidiagonal  matrix  is  the  final  phase  of  the  standard  algo- 
rithm for  the  singular  value  decomposition  of  a  general  matrix.  We  present  a  new  algorithm 
which  computes  all  the  singular  values  of  a  bidiagoneil  matrix  to  high  relative  accuracy 
independent  of  their  magnitudes.  In  contrast,  the  standard  algorithm  for  bidiagonal  matrices 
may  compute  small  singular  values  with  no  relative  accuracy  at  all.  Numerical  experiments 
show  that  the  new  algorithm  is  comparable  in  speed  to  the  standard  algorithm,  and  frequently 
faster.  We  also  show  how  to  accurately  compute  tiny  eigenvalues  of  some  classes  of  sym- 
metric tridiagonal  matrices  using  the  same  technique. 


1.  Introduction 

The  standard  algorithm  for  computing  the  singular  value  decomposition  (SVD)  of  a  gen- 
eral real  matrix  A  has  two  phases  [Golub  and  Kahan]: 

1)  Compute  orthogonal  matrices  />]  and  Qi  such  that  B  =  pJaQ^  is  in  bidiagonal  form, 
i.e.  has  nonzero  entries  only  on  its  diagonal  and  first  superdiagonal. 

2)  Compute  orthogonal  matrices  P2  and  Q2  such  that  2  =  P2BQ2  is  diagonal  and  nonnega- 
tive.  The  diagonal  entries  a,  of  2  are  the  singular  values  of  A.  We  wUl  take  them  to  be 
sorted  in  decreasing  order:  a,aa,j.i.  The  columns  of  Q  =  Q]Q2  are  the  right  singular 
vectors,  and  the  columns  of  P  =  PyP2  are  the  left  singular  vectors. 

Phase  1  may  be  done  in  a  finite  number  of  steps  whereas  phase  2  is  iterative,  although 
in  practice  convergence  is  usually  quite  fast.  The  error  analysis  of  this  combined  procedure  is 
well  known,  and  provided  neither  overflow  nor  underflow  occurs  may  be  summarized  as  fol- 
lows: 

The  computed  singular  values  a,  differ  from  the  true  singular  values  of  A  by  no  more 
than /(n)-e-||A||,  where  \\A\\  is  the  2-norm  of  A,  c  is  the  machine  precision,  and/(n)  is 
a  slowly  growing  function  of  the  dimension  n  ol  A. 

This  is  a  generally  satisfactory  conclusion,  since  it  means  the  computed  singular  values 
have  errors  no  larger  than  the  uncertainty  in  the  largest  entries  of  /t,  if  these  are  themselves 
the  results  of  previous  computations.  In  particular,  singular  values  not  much  smaller  than  ||A|| 
are  computable  to  high  relative  accuracy.  However,  small  singular  values  may  change  com- 
pletely, and  so  cannot  generally  be  computed  with  high  relative  accuracy. 


There  are  some  situations  where  the  smallest  singular  values  are  determined  much  more 
accurately  by  the  data  than  a  simple  bound  of  the  form  /(n)e||i4||  would  indicate.  In  this 
paper  we  will  show  that  for  bidiagonal  matrices  the  singular  values  are  determined  to  the 
same  relative  precision  as  the  individual  matrix  entries.  In  other  words,  if  all  the  matrix 
entries  are  known  to  high  relative  accuracy,  all  the  singular  values  are  also  known  to  high 
relative  accuracy  independent  of  their  magnitudes.  We  will  prove  similar  theorems  about  the 
eigenvalues  of  two  kinds  of  symmetric  tridiagonal  matrices:  those  with  zero  diagonal,  and 
diagonally  dominant  ones  (diagonal  dominance  will  be  defined  in  section  S  below;  it  includes 
certain  graded  matrices). 

In  such  situations  it  is  desirable  to  have  an  algorithm  to  compute  the  singular  values  or 
eigenvalues  to  the  accuracy  to  which  they  are  determined  by  the  data.  In  this  paper  we 
present  an  algorithm  for  computing  all  the  singular  values  of  a  bidiagonal  matrix  to 
guaranteed  high  relative  accuracy,  independent  of  their  magnitudes.  Our  algorithm  is  a  varia- 
tion of  the  usual  QR  iteration  which  is  used  in  the  standard  SVD  algorithm.  Briefly,  it  is  a 
hybrid  algorithm  of  the  usual  QR  iteration  with  a  "zero-shifted"  QR  modified  to  guarantee 
forward  stability.  Numerical  experience,  which  we  report  below,  shows  that  it  is  generally 
faster  than  the  standard  algorithm,  and  ranges  from  3.6  times  faster  to  1.8  times  slower 
counting  reduction  to  bidiagonal  form  (7.7  times  faster  to  2.6  times  slower  not  counting 
reduction  to  bidiagonal  form).  This  algorithm  may  be  also  be  used  to  accurately  compute  all 
the  eigenvalues  of  symmetric  tridiagonal  matrices  with  zero  diagonal,  and  positive  definite 
diagonally  dominant  symmetric  tridiagonal  matrices  (this  category  includes  graded  matrices). 

The  rest  of  this  paper  is  organized  as  follows.  Section  2  presents  perturbation  theory  for 
the  singular  values  of  a  bidiagonal  matrix,  and  shows  that  small  relative  perturbations  in  the 
nonzero  entries  of  a  bidiagonal  matrix  can  only  cause  small  relative  perturbations  in  its  singu- 
lar values.  We  also  present  a  theorem  which  says  when  an  offdiagonal  entry  can  be  set  to 
zero  without  making  large  relative  perturbations  in  any  singular  value;  this  theorem  is  the 
basis  of  the  convergence  criterion  for  the  new  algorithm.  Section  3  presents  the  algorithm, 
which  is  QR  iteration  with  a  "zero  shift,"  modified  to  be  forward  stable.  This  forward  stabil- 
ity combined  with  the  perturbation  theorem  of  section  2  shows  that  QR  can  compute  all  the 
singular  values  with  high  relative  accuracy.  Section  4  discusses  convergence  criteria  for  the 
new  algorithm,  since  the  convergence  criteria  for  the  standard  algorithm  can  cause  unaccept- 
ably  large  perturbations  in  small  singular  values.  It  also  discusses  the  practical  algorithm, 
which  is  a  hybrid  of  the  standard  algorithm  and  the  algorithm  of  section  3.  Section  S  presents 
perturbation  theory  for  the  eigenvalues  of  diagonally  dominant  positive  definite  symmetric 
tridiagonal  matrices,  and  shows  how  to  accurately  compute  all  their  eigenvalues.  Section  6 
shows  how  to  use  the  perturbation  theory  of  section  2  and  Sylvester's  Law  of  Inertia  to  com- 
pute the  singular  values  of  a  bidiagonal  matrix  to  high  relative  accuracy;  this  will  be  used  to 
verify  the  results  in  section  7,  which  discusses  numerical  experiments.  Section  7  also 
addresses  the  implications  of  our  results  for  the  "perfect  shift"  strategy  for  computing  singu- 
lar vectors.  Section  8  contains  a  detailed  error  analysis  of  the  new  algorithm.  Some  details 
of  the  implementation,  including  susceptibility  to  overflow  and  underflow,  are  discussed  in 
section  9.  Section  10  discusses  the  accuracy  of  the  computed  singular  vectors;  a  complete 
analysis  of  this  remains  an  open  question.  Section  11  contains  suggestions  for  parallel  ver- 
sions of  the  algorithms  presented,  open  questions,  and  conclusions. 


2.  Pertnrbation  Theory  for  Singular  Valaes  of  Bidiagonal  Matrices 

We  say  8a  is  a  relative  perturbation  of  a  of  size  at  most  -n  if  \ha  \^  t\\a\.M  A  and  hA 
are  matrices,  we  will  let  \A  \  and  |8i4  |  denote  the  matrices  of  absolute  entries  of  A  and  8i4. 
We  will  say  that  hA  is  a  componentwise  relative  perturbation  of  A  of  size  at  most  t\  if 
\hA  I  s  11  |A  I,  where  the  inequality  is  understood  componentwise. 

In  this  section  we  will  prove  two  perturbation  theorem  for  singular  values  of  bidiagonal 
matrices.  The  first  theorem  is  needed  to  prove  that  our  new  QR  iteration  does  not  disturb 
any  singular  values,  and  the  second  theorem  justifies  our  new  convergence  criterion  (see  sec- 
tion S  below). 

The  first  theorem  shows  that  if  8B  is  a  componentwise  relative  perturbation  of  size  t\  of 
the  n  by  n  bidiagonal  matrix  B,  then  the  singular  values  a/  of  B  +  65  will  be  relative  pertur- 
bations of  the  singular  values  ct,  of  B  of  size  less  than  about  {In  -  \)i\,  provided  (2n  -  1)ti  is 
small  compared  to  1.   More  precisely  we  will  show  that 

(i-T,)2-'s  ^s(i--n)'-2- 

(recall  that  ct,'  and  a,  are  sorted  in  decreasing  order).  This  will  follow  as  a  corollary  of  a 
more  general  result  for  symmetric  tridiagonal  matrices  with  zero  diagonal. 

The  second  theorem  says  when  we  can  set  an  offdiagonal  entry  of  a  bidiagonal  matrix  B 
to  zero  without  making  large  relative  perturbations  in  the  singular  values.  It  is  based  on  a 
simple  recurrence  for  estimating  the  smallest  singular  value  of  a  bidiagonal  matrix;  if  setting 
an  offdiagonal  entry  of  B  to  zero  cannot  change  this  recurrence  significantly,  we  show  that  no 
singular  value  can  be  changed  significantly  either. 

The  proof  of  the  first  theorem  depends  on  Sylvester's  Law  Of  Inertia  [Gantmacher, 
p.297]: 

Sylvester's  Law  Of  Inertia:  Let  A  be  symmetric  and  U  be  nonsingular.  Then  A  and  UAU^ 
have  the  same  number  of  positive,  zero  and  negative  eigenvalues. 

In  particular,  suppose  A  is  symmetric  and  tridiagonal,  with  diagonal  entries  ai a„ 

and  offdiagonal  entries  b\,  .  .  .  ,b„-\.  Then  via  Gaussian  elimination  without  pivoting  one 
can  write  A  -xI  =  LDL^ ,  where  L  is  unit  lower  triangular  and  bidiagonal,  and  D  is  diagonal 
with  entries  d,  given  by  the  recurrence  [Parlett,  p.  47] 

rfi  =  a,  -  X 

,  (2-1) 

di  =  a,-x  -  bf-i  I  di-i 

This  recurrence  will  not  break  down  {d,=  0  for  some  i<n)  as  long  as  x  is  not  one  of  the 
n(n -l)/2  eigenvalues  of  leading  submatrices  of  A.  Then  by  Sylvester's  Law  of  Inertia,  the 
numbers  of  eigenvalues  of  A  less  than  x,  equal  to  x,  and  greater  than  x  are  precisely  the 
numbers  of  <f,  which  are  negative,  zero  and  positive,  respectively. 

We  will  also  need  the  following  classical  eigenvalue  perturbation  theorem  due  to  Weyl: 

Theorem  1:  [Parlett,  p.  191]  Let  XjS  •  •  •  aX„  be  the  eigenvalues  of  the  symmetric  matrix  A, 
and  Xi'a  •  •  •  a:X„'  be  the  eigenvalues  values  of  the  symmetric  matrix  A+hA.  Then 
-||8A||  s  X„i„(6i4)  £  X,'-X,  s  X„,„(8A)  £  \\hA\\.  Here,  X„u,  and  X„„  denote  the  smallest 
and  largest  eigenvalues,  respectively. 

Now  we  present  our  central  result  of  this  section  (a  slightly  weaker  version  originally 
appeared  in  an  unpublished  report  [Kahan]): 

Theorem  2:  Let  /  be  an  n  by  n  symmetric  tridiagonal  matrix  with  zero  diagonal  and  offdiago- 
nal entries  i>i,  .  .  .  ,fc„-i.  Suppose  J  +  hJ  is  identical  to  J  except  for  one  offdiagonal  entry, 
which  changes  to  ab,  from  b,,  a^tO.  Let  A  =  max(|a|  ,  |a~'|).  Let  X,  be  the  eigenvalues 
of  /  sorted  into  decreasing  order,  and  let  X/  be  the  eigenvalues  of  J  +  hJ  similarly  sorted. 


Then 

^  s  —  s  A    .  (2.2) 

AX, 

In  other  words,  changing  any  single  entry  of  /  by  a  factor  a  can  change  no  eigenvalue  by 
more  than  a  factor  |a  |. 

Proof:  Assume  without  loss  of  generality  that  o>0,  and  no  b,  is  zero,  since  otherwise  /  is 
block  diagonal,  and  each  diagonal  block  may  be  analyzed  separately.  The  recurrences 
corresponding  to  (2.1)  foi  J-xI  and  J  +  hJ-xI  may  be  written 

Vl   =    -X 
iij  =  -* 

ut  +  x  =  -X  -  bi  I  Uk 

Vi  +  ,  =   -X  -  a^b*  I  V, 

Since  both  J  and  7  +  6/  have  nonzero  offdiagonals,  they  must  have  simple  eigenvalues  [Par- 
lett,  p.  124]  X,  and  X/,  respectively.  As  long  as  x  is  not  one  of  the  ii(n-l)  eigenvalues  of 
leading  principal  submatrices  of  J  and  J  +  hJ,  no  division  by  zero  will  occur  in  these 
recurrences.  Also,  u„  =  0  if  and  only  if  x  is  an  eigenvalue  of  J,  and  v„=0  if  and  only  if  x  is  an 
eigenvalue  of  7  +  67. 

Our  goal  is  to  show  that  each  X,  is  the  i-th  eigenvalue  of  some  symmetric  matrix  7(X,) 
which  differs  from  7  +  67  by  a  matrix  X  =  J  +  67-7(X,)  satisfying 

(A-'-l)X,  £  X„u,(X)  S  X„„(X)  £   (A-l)X,       if  X,2:0 
(A-l)X,s  X„,„(X)s  X„„(X)s  (A-'-l)X,      ifX,<0    • 
together  with  Theorem  1  these  inequalities  will  yield  the  desired  result. 
We  construct  7(X,)  as  follows.  Let 

Wj  =  Uj-a'~"  if /Si 


(2.3) 


Vj  =  ttj-a*    "  if  j>i 


Note  that  the  wj  satisfy  the  recurrence 

(-1)'"^ 

H-^  +  i  =   -xa<-'>'"''"'  -  bj/wj     if  ;<i 
Wj  +  i  =   — xa  —  a^b}/Wj  ' 

Wj  +  i  =   -xo<""'"''  -  bj/wj     iij>i 
or 

H-i    =     -X-X, 

Wj  +  1   =    -X-X;+i   -  bj/wj      if  j<i 

w,  +  ,  =   -x-x,  +  ,  -  a^bf/w,  ' 

Wj  +  i  =  -x-Xj  +  i  -  b^/wj     if  j>i 

which  is  the  recurrence  for  7(x)  =  J+hJ-X  where  X  =  diag(x,),  x,  =  (a-'-l)x.  Now  set 
x=X,.  Since  the  wj  and  uj  sequences  have  the  same  signs  by  construction  (including 
i4„  =  H'„  =  0),  X,  is  the  <-th  eigenvalue  of  7(X,).  Further,  X„„(X)  and  X„u,(X)  clearly  satisfy 
(2.3)  above,  n 

As  an  immediate  corollary  wc  get 

Corollary  1:  Let  7  be  an  n  by  n  symmetric  tridiagonal  matrix  with  zero  diagonal  and  offdiag- 
onal  entries  fci,  .  .  .  ,fc„-i.  Let  7  +  67  have  off  diagonal  entries  oifc),  .  .  .  ,a„_ifc„_i,  a,#0. 


Let  A  =    n  n>ax(|a,|  ,  |a,  '  I).    Let  Xj  be  the  eigenvalues  of  /  sorted  into  decreasing  order, 

1-1 
and  X/  be  the  eigenvalues  of  /  +  8y  similarly  sorted.  Then 

1         ^.' 
A  X, 

For   example,   if   l-i)S  |a|sl  +  T),    no   eigenvalue   can   change    by    a   factor   exceeding 
A  =  (l-T,)-*'. 

We  can  apply  Theorem  2  to  prove  a  similar  theorem  for  singular  values  of  bidiagonal 
matrices  by  noting  that  for  any  matrix  0  the  eigenvalues  of 

0   B^l 


B'  - 


B     0 


are  the  singular  values  of  B,  their  negatives,  and  some  zeros  (if  B  is  not  square)  [Golub  and 
Van  Loan,  p.  286].  Suppose  now  that  fi  is  n  by  n  and  bidiagonal  with  diagonal  entries 
j|,  .  .  .  ,5„  and  superdiagonal  entries  «|,  .  .  .  ,e„-i.  Then  by  permuting  the  rows  and 
columns  of  5 '  to  appear  in  the  new  order  l,n  +  l,2,n  +  2,  •  •  •  ,n,  2n,  we  see  B'  is  orthogo- 
nally similar  to  the  tridiagonal  matrix  B"  with  zeros  on  the  diagonal  and  offdiagonals 
*!,«], J2.<2.  ■  ■  ■  y'm-\,^m  [Golub  and  Kahan,  p.  213].  Thus  the  singular  values  of  B  are  the 
aijsolutc  values  of  the  eigenvalues  of  the  matrix  B"  which  is  of  the  form  required  by 
Theorem  2.    This  proves 

■iB  a. 


Corollary   2:   Let  B  be  an  n  by  n   bidiagonal  matrix   and   suppose  6B„  +  Bu 

in  -1 

6Bi,,  +  i  +  B,.,  +  i  =  a2,B,,  +  ,,  a^/0.  LetA=  H    m"(l«.|  .   laTM)-    Leta,a 


=   02. 


^a.  be  the 


singular  values  of  B,  and  let  a/: 


^  CT„ '  be  the  singular  values  of  B  +  6B .    Then 
A    . 


a, 


For  example,  if  1-t)  s   |oij  I  ^  l  +  'H.  ti>"  no  singular  value  can  change  by  more  than  a  fac- 
tor of  A  =  (1-T))'-^". 

That  this  result  is  essentially  best  possible  may  be  seen  by  considering  the  n  by  n  matrix 

1-T,      3(1+Tl) 

B(ti)  = 

3(1  +  T,) 
1-T, 

When  ^»l,  the  smallest  singular  value  is  approximately  p'~"(l-(2n  -1)t)). 

This  theorem  may  be  contrasted  with  the  following  classical  perturbation  bound  for 
singular  values,  where  it  is  only  possible  to  bound  the  absolute  perturbation  in  the  singular 
values  of  a  perturbed  general  matrix: 

Theorem  3:  [Golub  and  Van  Loan,  p.  286]  Let  cti&  •  •  •  ^<7„  be  the  singular  values  of  A, 
and  cTi'a  •  ■  ■  2:ct„'  be  the  singular  values  of  A  +hA.  Then  |cT,'-a,|  s  ||6A||. 

One  caveat  about  the  use  of  Corollary  2  in  practice  is  that  phase  1  of  the  SVD  algo- 
rithm, reduction  to  bidiagonal  form,  may  produce  completely  inaccurate  bidiagonal  entries. 
Sometimes,  however,  the  reduction  to  bidiagonal  form  is  quite  accurate,  so  that  the  singular 
values  of  the  original  matrix  can  be  computed  accurately;  we  shall  discuss  such  a  situation  in 
section  5  below. 

In  section  6  we  will  show  how  to  use  recurrence  (2.1)  in  practice  to  compute  the  singu- 
lar values  of  a  bidiagonal  matrix  with  guaranteed  high  relative  accuracy.  This  method,  though 


not  competitive  in  speed  on  a  serial  machine  with  the  algorithm  of  the  next  section,  can  be 
used  to  efficiently  verify  the  accuracy  of  the  singular  values  computed  by  another  method. 
The  algorithm  based  on  (2.1)  may  also  be  parallelized  easily  (see  section  6). 

The  second  result  of  this  section  tells  us  when  we  can  set  an  offdiagonal  of  B  to  zero 
without  making  large  relative  changes  in  the  singular  values.  This  theorem  will  justify  the 
convergence  criterion  we  describe  in  section  5  below. 

First  we  discuss  a  simple  recurrence  for  approximating  the  smallest  singular  value  of  a 
bidiagonal  matrix. 

Lemma  1:  Let  fi  be  a  n  by  n  bidiagonal  matrix  with  nonzero  diagonal  entries  Ji,  .  .  .  ,«„  and 
nonzero  offdiagonal  entries  «,,  .  .  .  ,<„-i.  Consider  the  following  recurrences: 

X„  =   |j„|  »ii  =   |iil 

foT  j  =  n-l  to  1  step  - 1  do  for  y  =  1  to  n  -  1  do  (2.4) 

X,-   =     |*;KX;^,/(Xj  +  ,    +     \ej\))  Jty4,    =     \Sj^iHi^j/(i>.J  +     \ej\)) 

Then  ||B"'||x'  =  min  Xy  and  ||B~'||r'  =  min  ji,^.  Furthermore,  letting 

a-  min(||B-'||r'  ,  ||B-'||i'),  we  have 

n->'^-||B-'||;:'sa„i,(B)Sn"^-||B-'|lxl 

n-"2-||B-'||r>  s  a„i„(fl)  s  „"2.||B-'||r'    .  (2.5) 

Proof:  By  means  of  pre-  and  postmultiplication  by  unitary  diagonal  matrices  with  diagonal 
entries  of  unit  modulus,  we  may  assume  that  s,>0  and  e,<0.  Then  B~'  is  easily  seen  to  have 
positive  superdiagonal  entries,  so  that  ||B~'||i  =  ||B~'u||i  and  ||B"'||i  =  ||u'^B"'||i,  where  u 
is  the  vector  of  all  ones.  v  =  B~^u  and  w^=u^B~^  are  easily  computed  by  back  and  forward 
substitution.  Thus  ||B~'{|x  =  max|v,|  and  ||B~'||i  =  max|H',  |.    Modifying  these  back  and  for- 

i  I 

ward  substitution  recurrences  to  compute  X|  =  l/v,  and  ji,=  1/h',  yields  the  recurrences  in 
(2.4).  Since  the  eigenvalues  of 


H  = 


0    B'' 
B     0 


are  the  positive  and  negative  singular  values  of  B, 

1|B->||  =  ||ff-'||  s  ||tf-'||.  =  max(||B->||.  ,  ||B-'-||.)  =  max(I|B-'l|x  ,  ||B-'||,)    . 

proving  the  inequality  g.^a^ia(B)  in  (2.5).  The  other  inequalities  are  standard  norm  inequali- 
ties. D 

From  Lemma  1  it  is  clear  that  if  \ej/\j  +  i  \  £  i)<l,  then  changing  ej  to  0  can  make  a 
relative  change  of  at  most  t\  in  \j  and  all  subsequent  X,,  i<j.  Thus  the  first  upper  and  lower 
bounds  on  ct„j„(B)  in  (2.5)  can  change  only  by  a  factor  of  t]  as  well.  Similar  comments  apply 
if  |ej7M.j|  s  ■i)<l.    This  suggests  the  following  criterion  for  setting  ej  to  0: 

Convergence  Criterion: 

Let  Ti<l  be  the  desired  relative  accuracy  of  computed  singular  values.    Then  if  either 
\ej/\j  +  i  |sti  or  \ej/tij\sj\,  set  ej  to  0. 

Now  we  will  state  and  prove  a  theorem  which  justifies  this  criterion.  We  will  only 
prove  the  theorem  for  the  case  |ej/Xy  +  ,  |st);  the  case  |*^/ji.y|s-n  is  analogous.  First  we  need 
some  notation.  Let  <J)(t|)  be  the  unique  positive  solution  of 

exp(2<t))  -  24)  -  1  =  -n^    ;  (2.6) 


it   is   easy   to   sec    that   <t)(-n)    is    asymptotically    ti/2"^    for   small   t)    and    that    for  all    ■t\, 

4>('n)  ^  T)/2"^.     Let    fl    be    a    bidiagonal    matrix    as    in    Lemma    1    with    singular  values 

ai^  •  •  ■  ^a„,  let  U  =  B  except  for  entry  ej  which  is  zero,  and  let  aj'a  •  •  •  Sa„'  be  the 
singular  values  of  U.   Let  I,(t))  be  the  interval  of  a's  such  that 

-«t.(Ti)  s  ln((T  /  a,')  s  <|.(t,)    :  (2.7) 

I,(ti)  is  essentially  the  set  of  a  which  differ  from  a/  by  a  relative  perturbation  of  at  most 
t)/2"^.  Some  of  these  intervals  may  overlap;  let  C(_r\)  denote  the  collection  of  disjoint  inter- 
vals made  of  connected  components  of  U^'C'H)-  Now  we  may  state 

i 

Theorem  4:  Let  B  and  U  be  bidiagonal  matrices  as  described  above,  and  suppose 
\ej/\j  +  i  Ist).  Then  each  singular  value  cr,  of  B  lies  in  the  connected  component  of  C(n)  con- 
taining 1,(7)).  In  particular,  if  that  connected  component  consists  of  m  overlapping  intervals 
Ijii\),  then 

-m<|)(T,)  s;  ln(a,'  /  a,)  s  m^(r))  (2.8) 

Therefore,  the  relative  perturbation  caused  in  a/  by  setting  the  offdiagonal  entry  in  8B  to 
zero  is  at  most  nr\/2^'^,  and  if  a,  is  sufficiently  separated  from  the  other  singular  values,  at 
most  Ti/2"^. 

For  example,  this  theorem  lets  us  conclude  that  setting  -n  to  0  in 

fl   V 
1    1 
a 

can  change  the  singular  values  2"^,  1,  and  2""2a  by  at  most  factors  of  l±-n  if  t)  is  small, 
independent  of  a.  Indeed,  if  D  is  any  bidiagonal  matrix  this  theorem  guarantees  that  we  can 
set  T]  to  0  in 

'l    T, 
D 

without  making  relative  perturbations  larger  than  i\  in  any  singular  value. 

The  proof  of  this  theorem  will  depend  on  a  sequence  of  technical  lemmas.   The  first  is  a 
trivial  consequence  of  Taylor's  Theorem: 

Lemma  2:  Let /and  g  be  continuously  differentiable  functions  on  the  nonnegative  real  axis, 
with  /(r)<^(;)  for  t  positive  and  sufficiently  small.  Let  £  *=  mf{t>0:  f(t)s:g(t)},  and  t="  if 
no  such  r  exist.  Then  if  £  is  finite, /'(t)a^'(t). 

We  wUl  use  contrapositive  of  this  result  to  show  when  f<g  for  all  r;  if  /(t)^«(t) 
would  imply  that/'(t)<^'(t),  then /must  be  less  than  g  everywhere. 

In  our  case,  we  define  /(f)  and  g  (r)  as  follows.  Write 

K  C 
R 

where  K  is  j  by  j,R  is  n-j  by  n-j,  and  C=ejlf,  where  /=(0 0,1)''  and /=  (1,0 0)'". 

Assume  as  in  Lemma  1  that  «y<0.    Let 


B  = 


U(t)  = 


K  C(t) 
R 


where  C  (t)= -tXj^jlf  so  that  l/(0)=t/  and  l/(Ti)  =  fi.  Let  a,'(0  be  the  Mh  largest  singular 
value  of  £/(f).   Then  we  first  let 


/(»)=-*(')     and    «(r)=ln(cr/(0/a,')  (2.9a) 

and  apply  Lemma  2  to  prove  the  first  inequality  in  (2.8),  and  then  let 

/(0  =  ln(cT.'(0/a/)     and    *(<)  =  *(/)  (2.9b) 

to  prove  the  second  inequality.    In  order  to  apply  Lemma  2  to  draw  this  conclusion,  we  need 
to  compute  the  derivatives  of  the  functions  in  (2.9). 
Lemma  3:  Unless  a,'(»)  is  a  singular  value  of  R, 

'^'"'  ^T   ((a/(0/a/)^-l)    ''^^i  "^^"'^'^^  ^  ""  ^""   ((a,'(0/<)^-l)    '  "^    " 

Proof:  We  begin  with  a  simplifying  assumption:  We  assume  K  and  R  have  no  common  singu- 
lar values.  If  this  is  not  true,  consider  a  sequence  of  problems  with  K^-K,  Rn-R  and  where 
K„  and  R„  have  distinct  singular  values;  the  general  result  will  follow  from  continuity. 

We  may  define  a  singular  value  a{i)  and  its  singular  vectors  u(t)  and  v(f)  of  U(t)  by 
the  equations  Uv^ou  and  u^U=av^  (where  we  have  suppressed  the  argument  /).  Using  the 
fact  that  u^u  =  v'^v>0,  wc  see  from  i/v+l/v  =  au  +  aii  and  from  ii  U+u^U  =  av^+(TV    that 

Now  partition  u''=(u[,U2)  and  v'"=(v[,vj)  conformally  to  B,  whence 
Kvi  +  Cv2  =  aui  u]K  =  (Tv[ 

^V2  =  auj         "*  uJC  +  uIr  =  CTv[    ■ 


Now 

U(t)  = 


0  c 
0    0 


where     C  =  -\j  +  i-lf^    . 


By  rearranging  the  recurrence  (2.4)  for  Xy  +  i  we  see  X^  +  j  =  ||/?~^/l|i  '.  Thus 

^(,)  =         "i"Cv2         ^  -"i"y^v2  .2  11) 

u[u,  +  u[u2  ||/J-''yi|i(«[u,  +  «Iu2) 

Now  we  derive  another  expression  for  vj  in  order  to  eliminate  it  from  (2.11).  Since 
R'^Rv2  =  aR'^U2  =  a^V2-aC'"u,  =  a^V2  +  c7r/7''M,  ||«"^/l|f' 
we  may  solve  for  V2  as  follows  provided  a  is  not  a  singular  value  of  R: 

V2  =  at/'"u,(/J^«-a2/)-7||J?-^/l|r'  =  <Tf/''«,/?-'(/-a2(«''/?)-')-'/?-7||'f"'"/lir' 
and  so 

4ln(a(0)  =  <^(0/a(0  =  ,.^(^:!4-.M44.I^I™^i^^  .    (2.12) 

Since  Ms  a  unit  vector,  the  second  factor  in  this  expression  is  between  0  and  1.  It  cannot  be 
zero  because  otherwise  c'^u i  =  0,  Rv2  =  au2  and  u2R  =  av2,  and  so  a  would  be  a  singular 
value  of  R  contrary  to  assumption.  The  third  factor  is  strictly  between  0  and  1.  The  last  fac- 
tor is  a  Rayleigh  quotient  and  so  bounded  by  the  extreme  eigenvalues  of  the  matrix  in  the 
middle,  i.e.  min  ((a/ajjj^- 1)"',  and  max  ((a/ayfj)'- 1)"',  where  aj(j  are  the  singular  values 

of  R.  This  is  in  turn  bounded  by  the  extreme  values  of  l/((ff/a'j)^- 1).  This  proves  the 
lemma.  D 

Lemma  4:  <J)(0)  =  0  and  4i(0)=2~"^.  ^(t)  satisfies  the  differential  equation 


*(')  =    cxp(24.(r))  -  1  ^2-"») 

and  <Ji(r)  =  -<j>(f)  satisfies  the  differential  equation 

*^')  =  1  -  c,p;-2^(0)  <^-i^'') 

Proof:  Simply  differentiate  the  defining  equation  (2.6)  for  4>(»).  Q 

Proof  of  Theorem  4:  Now  note  that  ln(a,'(0)/a/)  =  0  and  its  derivative  d,'(0)/a/(0)  =  0  as 
well  since  a,'(f)  is  an  even  function  of  t.  Since  <J)(0)  =  0  and  <t)(0)  =  2""^,  we  see  that  equation 
(2.8)  is  true  (for  m  =  1)  for  sufficienUy  small  r\.  To  show  it  is  true  for  all  if,  we  assume  to  the 
contrary  that  there  is  some  positive  -n  for  which  it  is  false,  and  let  %  be  the  infimum  of  all 
these  T).  Then  a/(0  will  be  on  the  boundary  of  C(ii),  which  means  |ln(CT,'(4)/aj'  |  will  be  at 
least  <J)({)  for  all  j.  From  Lemma  3  we  see  this  implies 

l-cxp("-24>(0)  <  i^(''''«)''^'')  <  exp(2<;(0-l    ' 
But  we  also  have  from  Lemma  4  that   that 


*(t)  =    -^ /  -,,/.^^       ^^^      *(S)  = 


l-eip(-2vl/(£))  '  '        exp(2(|>(4))-l     " 

Therefore,  the  choice  (2.9a)  of  /  and  «  yields  /(0<i(6).  so  /({)  cannot  equal  ^(5).  The 
choice  (2.9b)  yields  the  same  conclusion.  Therefore,  a,'(|)  cannot  lie  on  the  boundary  of 
C(4)  as  supposed.  This  completes  the  proof  of  Theorem  4.  D 


3.  QR  iteration  witti  •  lero  shift 

The  standard  algorithm  for  finding  singular  values  of  a  bidiagonal  matrix  £  is  the  QR 
algorithm  applied  implicitly  to  B'"b  [Golub  and  Kahan].  Part  of  the  algorithm  involves  a  shift 
a,  which  is  usually  taken  to  be  the  smallest  eigenvalue  of  the  bottom  2  by  2  block  of  BB''^. 
The  algorithm  computes  a  sequence  B,  of  bidiagonal  matrices  starting  from  Bo=B  as  follows. 
The  algorithm  implicitly  computes  a  QR  factorization  of  the  shifted  matrix  bJBi-oI  =  QR, 
where  Q  is  orthogonal  and  R  upper  triangular,  and  computes  a  bidiagonal  Bt  +  i  such  that 
Bf+,S,  +  i  =  RQ  +  al.  As  /  increases,  B,  converges  to  a  diagonal  matrix  with  the  singular 
values  on  the  diagonal. 

The  roundoff  errors  in  this  algorithm  are  generally  on  the  order  of  «||B||,  where  c  is  the 
precision  of  the  floating  point  arithmetic  used.  From  Theorem  3  of  the  last  section,  this 
means  we  would  expect  absolute  errors  in  the  computed  singular  values  of  the  same  order. 
In  particular,  tiny  singular  values  of  B  could  be  changed  completely. 

In  this  section  we  present  a  variation  of  the  standard  algorithm  just  described  which  can 
compute  all  the  singular  values  of  a  bidiagonal  matrix  with  guaranteed  high  relative  accuracy, 
even  the  tiniest  ones.  We  will  call  this  the  "implicit  zero-shift  QR"  algorithm,  since  it 
corresponds  to  the  above  algorithm  when  a  =  0.  However,  it  is  organized  in  such  a  way  as  to 
guarantee  that  each  entry  of  B,  +  i  is  computed  from  Bj  to  nearly  full  machine  precision. 
Corollary  2  of  the  last  section  then  implies  that  the  singular  values  of  B,  and  B,  +  i  all  agree  to 
high  relative  accuracy.  When  B,  +  i  has  finally  converged  to  a  diagonal  matrix,  these  diagonal 
entries  must  therefore  also  be  accurate  singular  values  for  the  initial  B=Bo.  Exactly  how  to 
detect  this  convergence  is  an  interesting  issue  and  discussed  in  the  next  section. 

The  rest  of  this  section  is  organized  as  follows.  First  we  review  the  standard  algorithm 
for  singular  values  of  a  bidiagonal  matrix.  Then  we  show  how  it  simplifies  when  the  shift  is 
zero.  We  then  discuss  an  error  analysis  of  the  resulting  implicit  zero-shift  QR  to  show  that  it 
computes  each  entry  of  B,  +  i  with  high  relative  accuracy  (the  details  of  the  error  analysis  are 
in  section  8).  Finally,  we  discuss  the  asymptotic  convergence  rate. 

The  final  algorithm  is  a  hybrid  of  the  standard  QR  and  implicit  zero-shift  QR.  Stan- 
dard QR  is  used  when  the  roundoff  errors  are  guaranteed  not  to  make  unacceptably  large 
perturbations  in  the  smallest  singular  values  of  B,  and  implicit  zero-shift  QR  is  used  other- 
wise. Thus,  the  hybrid  algorithm  is  equivalent  to  the  standard  algorithm  on  matrices  whose 
condition  number  (the  ratio  of  the  largest  to  smallest  singular  values)  is  modest.  The  hybrid 
algorithm  will  be  discussed  more  fully  in  the  next  section. 

In  order  to  summarize  the  standard  QR  algorithm,  we  need  some  notation.  Let  J (ij,^) 
denote  the  Given's  rotation  in  entries  i  and  j  by  angle  6.  In  other  words,  J(i,j,6)  is  an  n  by  n 
identity  matrix  except  for  rows  and  columns  i  and  j  whose  intersections  consist  of  the  follow- 
ing 2  by  2  rotation  matrix: 

cos6     sin6 
-  sine   cose 

Given  the  vector  x,  choosing  6  so  that  x;/i,  =  tane  means  that  the  /-th  and  y-th  entries  of 
J(.',j,^)x  will  contain  i:'^{x^  +  y})  and  0,  respectively. 

We  will  illustrate  the  algorithm  on  a  4  by  4  example,  where  we  use  x  and  -♦•  to  indicate 
nonzero  entries  and  0  and  blank  to  indicate  zero  entries.  Initially  B,  is  in  the  form 


B,  = 


X    X 
X    X 

X    X 
X 
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We  begin  by  postmultiplying  B,  by  /,  ■  7(1, 2,61),  where  6]  will  be  discussed  in  a  moment. 
This  introduces  a  nonzero  entry  in  the  (2,1)  position: 


Bp/i  = 


X      X 
■^     X    X 

X    X 

X 


(3.1) 


B^J X  will  now  be  pre-  and  postmultiplied  by  a  sequence  of  Givcn's  rotations  whose  purpose  is 
to  "chase  the  bulge"  indicated  by  "+"  off  the  end  of  the  matrix.  Choose  62  so  that 
Ji  ■  / (1,2,62)  introduces  a  zero  in  the  (2,1)  entry  oiJ-iBJx- 


J-fi^x  = 


X    X     + 

0  X    X 

X      X 
X 


(3.2) 


Next  choose  63  in  J^  «  /(2.3,e3),  64  in  J^  =  7(2,3,64),  65  in  /j  -  /(3,4,65),  and  65  in 
■fe  •=  -^(3,4,66),  to  give  the  following  sequence  of  transformations: 


J2B,JxJi   = 


X    X    0 

X      X 

+     X    X 

X 


J*J2B,J\J-i   — 


X    X 

X    X     + 

0   X    X 

X 


J^J2B,JiJiJs   = 


X     X 

X    X  0 

X  X 

+  X 


*i  +  l   "  J 6^ *J 2B jJ \J iJ i   = 


X    X 
X    X 
X     X 

0   X 


The  usual  error  analysis  of  Given's  rotations  [WUkinson,  p.  131-139]  shows  that  B.  +  j  is  the 
exact  transformation  of  a  matrix  B,  +  E  where  ||£||  is  on  the  order  of /(n)e||fi,||,  f(n)  a  mod- 
est function  of  n. 

To  choose  6)  we  compute  a  shift  a  which  is  generally  the  smallest  eigenvalue  of  the  bot- 
tom right  2  by  2  submatrix  of  B,bJ.  6,  is  then  chosen  so  /,  introduces  a  zero  into  the  (2,1) 
entry  of  /,  (BjB,-al).    It  is  easy  to  see  that  this  means  that 


tanei  = 


(BjB,),2 
'-(BjB.h 


(3.3) 


This  choice  of  shift,  caUed  Wilkinson's  shift,  guarantees  at  least  Unear  convergence  and  gen- 
eraUy  yields  asymptotic  cubic  convergence  of  the  offdiagonal  entries  of  B,  to  zero  [Parlett,  p. 
151].   This  is  assuming  arithmetic  is  done  exactly. 

Now  let  us  take  a  =  0.  Let  us  also  drop  the  subscript  /  on  B,  for  simplicity  of  notation. 
From  (3.3)  we  see  that  tanO,  =  -*i2/*ii  so  that  the  result  of  the  first  rotation  (for  a  4  by  4 
matrix)  is 


fl(i) 


fl/,  = 


b[y   0 

b^l'    *iS?     *23 

''33     *34 
*44 
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We  let  the  superscript  on  the  matrix  and  its  entries  indicate  that  /j  has  been  applied.  Com- 
paring to  (3.1)  we  see  that  the  (1,2)  entry  is  zero  instead  of  nonzero.  This  zero  will  pro- 
pagate through  the  rest  of  the  algorithm  and  is  the  key  to  its  effectiveness.  After  the  rotation 
by  Ji  we  have 


MV  h^  MV 

B")  .  /jB/,  = 

0    tg*  fcS> 

*33     *34 

^44 

Note  that 

sine^fc^lr*    8in62*23 

cose2b^    00562^23 

i.e.  it  is  a  rank  one  matrix.  Therefore,  postmultiplication  by  ^3  to  zero  out  the  (1,3)  entry 
will  also  zero  out  the  (2,3)  entry: 


fl(3'   -»  /2B-^l-'3 


0 

h^      0 


biA 
biA 


Comparing  to  (3.2)  we  see  that  there  is  an  extra  zero  on  the  superdiagonal.  Rotation  by  J^ 
just  repeats  the  situation:  the  submatrix  of  JiJiBJ^Ji  consisting  of  rows  2  and  3  and  columns 
3  and  4  is  rank  one,  and  rotation  by  Jf,  zeros  out  the  (3,4)  entry  as  well  as  the  (2,4)  entry. 
This  regime  repeats  itself  for  the  length  of  the  matrix. 

The     following     algorithm     incorporates     this     observation.     It     uses     a     subroutine 
ROT(/,g,cs,sn,r)  which  takes/and  g  as  inputs  and  returns  r,  cj  =  cose  and  in  =  sine  such  that 

(3.4) 


cs      sn 
-sn    cs 

f 

= 

r 
0 

ROT(f,g,cs,sn,r):  takes  /and  g  as  input  and  returns  cs,  sn  and  r  satisfying  (3.4). 

if  C/  =  0)  then 

cs  =  0;  sn  =  1;  r  =  g 

elseif  (|/|>|«|)  tji£ii_ 

t   =   glf\U=    Vl+,2 

CS  =  lltt;  sn  -  fcs\  r  =  ftt 
else  

I  =  //«;  ti  =  Vl  +  ,2 

sn  =   l/«;  cs  =  t*sn;  r  =  g*n 
endif 

Barring  underflow  and  overflow  (which  can  only  occur  if  the  true  value  of  r  itself  would 
overflow),  ROT  computes  cs,  sn  and  r  to  nearly  full  machine  accuracy  (see  section  8  below 
for  details).  It  also  uses  fewer  operations  than  the  analogous  routine  "rotg"  in  LINPACK 
[UNPACK]. 
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Implicit  Zero-Shift  QR  Algorithm:  Let  fi  be  an  n  by  n  bidiagonal  matrix  with  diago- 
nal entries  jj,  .  .  .  ,j„  and  superdiagonal  entries  «i,  .  .  .  ,e,-i.  The  following  algo- 
rithm replaces  j,  and  e,  by  new  values  corresponding  to  one  step  of  the  QR  iteration 
with  zero  shift: 

oldcs  =  1 
/=*(!) 
g  =  *(1) 
fori  =  l,n-l 

call  ROT(f,g,cs,sn,r) 

if  (/9tl)  «(/-!)  =  oldsn*r 

f  =  oldcs*  r 

g  =  j(i  +  l)«in 

h  =  s{l  +  l)'cs 

call  ROT(/,g,cs,sn,r) 

,(.)  =  r 

/=  h 

8  =  <('  +  l) 

oldcs  =  cs 

oldsn  =  sn 
endfor 

«(n-l)  =  /i»jn 
j(n)  =  A*cj 

It  is  straightforward  to  verify  that  this  algorithm  "chases  the  bulge"  in  the  manner 
described  above.  It  is  remarkable  that  outside  the  two  calls  to  ROT,  there  are  only  4  multipli- 
cations in  the  inner  loop.  This  is  to  be  contrasted  with  the  usual  QR  algorithm,  which  in  addi- 
tion to  two  calls  to  ROT  has  12  multiplications  and  4  additions.  Thus  the  inner  loop  is  much 
more  efficient  than  the  standard  algorithm.  Note  also  that  it  is  eminently  parallelizable, 
because  nil  rotations  can  be  done  at  once.  Since  data  need  only  be  passed  serially  along  the 
diagonal,  it  can  also  be  implemented  in  a  systolic  array. 

This  algorithm  is  also  much  more  accurate  than  the  standard  algorithm.  The  source  of 
the  extra  accuracy  is  the  absence  of  addition  and  subtractions,  which  means  all  roundoff 
errors  appear  multiplicatively  (there  is  an  addition  in  ROT,  but  it  is  harmless).  Our  model  of 
arithmetic  is  the  usual  one: 

flixoy)  =  {xoyy{\  +  e)  (3.5) 

where  <>  is  one  of  -t- ,  - ,  •  and  /,  flix'y)  is  the  floating  point  result  of  the  operation  «,  and 
\e  \  s  €,  where  «  is  the  machine  precision.  This  would  appear  to  eliminate  machines  like  the 
Cray  and  Cyber  from  consideration,  since  those  machines  do  not  conform  to  (3.S)  for  addi- 
tion and  subtraction  when  cancellation  is  involved,  but  since  we  only  need  to  use  (3.5)  for 
multiplication  (as  well  as  square  root,  division,  and  addition  of  positive  quantities  in  ROT), 
this  analysis  covers  those  machines  as  well.  We  also  assume  overflow  and  underflow  do  not 
occur  (we  return  to  these  issues  in  section  9). 

We  present  two  theorems  about  the  accumulation  of  error  in  the  algorithm.  The  proofs 
are  given  in  section  8.  The  first  theorem  develops  a  bound  for  the  relative  error  in  the  com- 
puted s(i)  and  <(i)  of  the  form  cn€  (c  is  a  modest  constant)  and  uses  it  with  Theorem  2  of 
section  2  to  show  that  the  relative  difference  between  the  singular  values  of  the  bidiagonal 
matrix  B  and  the  output  matrix  B'  of  the  implicit  zero-shift  QR  algorithm  is  cn^e/(l-cn^€). 
In  other  words,  the  relative  error  in  the  computed  singular  values  can  only  grow  with  the 
square  of  the  dimension. 
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Theorem  5:  Let  fi  be  an  n  by  n  bidiagonal  matrix  and  B '  the  matrix  obtained  by  running  the 
implicit  zero-shift  QR  algorithm  on  B.  Let  the  singular  values  of  fi  be  ai^  ■  ■  ■  ^a„,  and 
the  singular  values  of  B '  be  ct  i '  a  •  •  •  a  c7„ ' .  Then  if 

o)  -  69n^€<l    ,  (3.6) 

the  relative  differences  between  the  singular  values  of  B  and  the  singular  values  of  B'  are 
bounded  as  follows: 

k,-a/|  s  -r^^—ai    ■ 

1  —  0) 

Let  Bt  be  the  matrix  obtained  after  ik  repetitions  of  the  implicit  zero-shift  QR  algorithm,  and 
let  <Jk\^  •  •  •  ^oia,  be  its  singular  values.   Then  if  condition  (3.6)  holds  we  have 

k.-<Tbl  S  (.,    Vt    -  1)  •  «^.  =  69*n2e-a.    . 

where  the  approximation  to  the  last  upper  bound  holds  if  iuxCl- 

This  result  is  actually  rather  pessimistic,  as  our  second  result  shows:  when  we  approach 
convergence  in  the  sense  that  all  rotations  are  through  angles  bounded  away  from  -n/l,  errors 
do  not  accumulate  at  all  and  the  error  in  the  computed  e{i)  and  s{i)  is  bounded  by  c'-e,  c' 
another  modest  constant.  With  Theorem  2  this  yields  an  error  bound  on  the  computed  singu- 
lar values  of  the  form  c'n€/(l-c'ne). 

Theorem  6:  Let  5  be  an  n  by  n  bidiagonal  matrix  and  B'  the  matrix  obtained  by  running  the 
implicit  zero-shift  QR  algorithm  on  B.  Assume  that  all  the  rotation  angles  6  during  the  course 
of  the  algorithm  satisfy  sin^e  s  t  <  1.  Let  the  singular  values  ol  B  be  ai^  ■  ■  ■  s^a„,  and 
the  singular  values  of  S '  be  a/a  •  •  •  &a„'.  Then  if 

-  =  -^^r;^<i.  (3.7) 

the  relative  differences  between  the  singular  values  of  B  and  the  singular  values  of  B '  are 
bounded  as  follows: 

\<T,-a,' 


1-0) 


Let  Bj  be  the  matrix  obtained  after  k  repetitions  of  the  implicit  zero-shift  QR  algorithm, 
where  we  assume  all  rotation  angles  6  satisfy  sin^e  s  t  <  1.  Let  ajj^  •  •  •  Scjib,  be  the 
singular  values  of  B^.    Then  if  condition  (3.7)  holds  we  have 

where  the  approximation  to  the  last  upper  bound  holds  if  toxcl- 

Note  that  t  can  easily  be  monitored  by  the  algorithm  as  it  proceeds. 

The  standard  algorithm  does  not  always  achieve  this  accuracy  for  three  reasons.  First, 
the  convergence  criteria  in  the  standard  algorithm  can  change  small  singular  values  com- 
pletely (this  is  discussed  in  detail  in  the  next  section).  Second,  rounding  errors  committed 
while  "chasing  the  bulge"  with  a  large  shift  can  obscure  small  matrix  entries  and  small  singu- 
lar values.  Third,  roundoff  errors  when  the  shift  is  zero  result  in  nonzero  entries  appearing 
and  propagating  in  those  offdiagonal  entries  of  intermediate  results  which  should  be  zero, 
and  which  are  kept  zero  by  the  new  algorithm.  This  third  effect  seems  mild,  however,  and  as 
a  result  the  standard  algorithm  sometimes  computes  small  singular  values  with  higher  relative 
accuracy  than  the  usual  bound  /(n)€||A||  would  lead  us  to  expect  (see,  for  example,  the 
numerical  examples  of  Class  1  in  section  7). 
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The  pattern  of  zeros  above  the  diagonal  during  the  QR  sweep  also  appears  when  apply- 
ing QR  to  a  symmetric  tridiagonal  matrix.  This  pattern  can  be  exploited  to  give  fast,  square 
root  free  versions  of  the  algorithm  (see  [Parlett,  p.  164]  for  a  discussion).  Unfortunately,  this 
does  not  yield  forward  stability  and  high  accuracy  as  it  does  for  the  bidiagonal  case. 

Finally,  wc  discuss  the  asymptotic  convergence  rate  of  the  algorithm.  It  is  well  known 
that  unshifted  QR  on  a  symmetric  matrix  is  essentially  the  same  as  inverse  iteration  [Parlett, 
p.  144].  Therefore  we  can  conclude  that  the  last  off  diagonal  element  e„-i  should  converge  to 
zero  linearly  with  constant  factor  aj-j/aj.  If  there  is  a  cluster  of  m  small  singular  values 
isolated  from  the  remaining  ones,  e„-„  will  converge  to  zero  linearly  with  constant  factor 

ol-m  +  l' 


i/a5_». 
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4.  Convergence  Criteria 

In  this  section  we  discuss  convergence  criteria  for  the  new  algorithm,  and  describe  the 
practical  version  of  the  algorithm,  which  is  a  hybrid  of  the  usual  shifted  QR  and  the  implicit 
zero-shift  QR.  The  same  analysis  leading  to  the  convergence  criterion  will  lead  to  a  criterion 
for  switching  from  zero-shift  QR  to  shifted  QR  without  damaging  any  tiny  singular  values. 
The  switching  criterion  depends  on  a  user  specifiable  tolerance  tol  which  is  the  desired  rela- 
tive accuracy  in  the  singular  values  {tol  should  be  less  than  1  and  greater  than  «).  The  result- 
ing hybrid  algorithm  will  therefore  run  about  as  fast  as  the  standard  algorithm  on  matrices 
without  any  tiny  singular  values.  We  will  also  discuss  convergence  criteria  in  the  case  where 
one  is  only  interested  in  absolute  precision  in  the  singular  values. 

We  begin  by  discussing  the  convergence  criteria  used  in  the  current  version  of  the  algo- 
rithm [LINPACK],  and  explain  why  they  arc  unsuitable  for  our  algorithm.  The  code  in  LIN- 
PACK  has  two  tests  for  setting  entries  of  the  bidiagonal  matrix  B  to  zero.  Recall  that 
*i,  .  .  .  ,j„  are  the  diagonal  entries  of  B  and  ei,  .  .  .  ,«„-i  are  the  off  diagonal  entries.  The 
first  test  is 

if  (|e,|  +   k.-il  +   U.I  =   I'.l  +   l«.-il).    set  5,  too  (4.1) 

This  rather  enigmatic  looking  test  works  as  follows.  If  \s,\  <  .5-€-(le,|  +  |«,-i|),  the  test 
will  be  satisfied  and  s,  set  to  zero.  In  other  words,  (4.1)  is  a  way  of  asking  whether  one 
number  is  less  than  roundoff  error  in  another  number  without  needing  to  know  the  machine 
precision  e  explicitly.  The  other  convergence  test  is 

iii\s,\+   U,-,|+   |«.-il=   \sA+   U,-il).    set«,-itoO  (4.2) 

Both  tests  compare  an  entry  x  of  B  with  its  two  nearest  neighbors  on  the  other  diagonal,  and 
set  X  to  zero  if  it  is  negligible  compared  to  those  neighbors.  One  justification  for  these  tests  is 
that  roundoff  error  during  the  rotations  could  make  the  matrix  indistinguishable  from  one 
with  a  zero  in  x's  position.  Also,  they  clearly  introduce  errors  no  worse  than  /(n)€||i4l|. 

Both  tests  are  unsatisfactory  for  our  algorithm.  Test  (4.1)  introduces  a  zero  singular 
value  where  there  was  none  before,  so  it  is  clearly  unsatisfactory.  The  following  example 
shows  why  (4.2)  is  also  unsatisfactory.  Suppose  t)  is  sufficiently  small  that  in  floating  point 
arithmetic  l-i-i)=l.  Consider  the  matrix 


B(x)  = 


n^    1 

1    X 


1     1 
,2 


When  x  =  T)  it  is  easy  to  verify  that  the  smallest  singular  value  of  B  (t])  k,  about  t]'.  Test  (4.2) 
would  set  X  to  0,  but  fi(0)  has  a  smallest  singular  value  of  about  ti^/V2,  which  is  utteriy  dif- 
ferent. 

Our  convergence  criterion  must  guarantee  that  by  setting  some  e,  to  0  (clearly  no  j,  can 
ever  be  set  to  zero),  no  singular  value  is  perturbed  too  much.  The  simplest  such  criterion 
would  only  set  e,  to  zero  if  it  were  less  than  tofa,  where  a  is  a  reliable  estimate  or  underes- 
timate of  the  smallest  singular  value.  Such  a  a  is  provided  by  the  recurrences  in  (2.4).  How- 
ever, this  method  is  overly  conservative,  and  generally  waits  much  too  long  to  set  e,  to  0.  A 
much  better  estimate  is  given  in  section  2  and  justified  by  Theorem  4.    We  repeat  it  here: 

Convergence  Criterion: 

Let  \j  and  fij  be  computed  by  the  recurrences  in  (2.4).     If  cither    \ej/\j  +  i\^tol  or 
\ejliLj\stol,  set  ej  to  0. 
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This  criterion  is  more  expensive  than  the  standard  LINPACK  criterion,  but  avoids  situa- 
tions like  setting  jr  to  0  in  fl  (x)  in  the  last  paragraph,  and  recognires  that  setting  x  to  zero  in 

X 

D 

is  harmless  if  \x  \^tol,  independent  of  D. 

Now  we  consider  how  to  decide  whether  to  use  implicit  zero-shift  QR  or  standard 
shifted  QR.  In  order  to  estimate  the  rounding  errors  which  would  occur  during  shifted  QR, 
we  need  an  estimate  of  ||B||.  We  wiU  use  a  ■  max(|j(j)l,  |e(i)|),  which  is  easily  seen  to 

underestimate  ||fi||  by  no  more  than  a  factor  of  2.  In  terms  of  a,  a,  and  tol,  our  decision 
algorithm  is 

if  (1.  -I-  /•  tol  •  {o_l^  =  1.) 

use  the  implicit  zero-shift  QR 
else 

use  shifted  QR 
endif 

The  test  tests  whether  the  rounding  errors  c-ct  which  shifted  QR  could  introduce  are  greater 
than  the  largest  tolerable  perturbation  tola.  It  is  stated  in  a  machine  independent  way  which 
avoids  using  «  explicitly.  The  factor  /^l  is  a  fudge  factor  which  makes  zero-shifting  less 
likely  on  tight  clusters  of  singular  values;  we  currently  use /=min(n  -  l,m)  if  the  original 
matrix  was  n  by  m. 

In  practice  there  is  one  further  test  for  using  the  implicit  zero-shift  QR.  If  the  above 
test  chooses  shifted  QR,  we  must  still  compute  the  shift  a,  which  is  the  smallest  eigenvalue 
of  the  bottom  2  by  2  matrix  of  fifl^.  (We  also  make  sure  a  is  not  larger  than  a  small  multiple 
of  the  smallest  diagonal  entry,  in  order  to  avoid  underflow  difficulties.)  From  (3.3),  we  see 
the  tangent  of  the  first  rotation  angle  is  given  by 

^(i)''(i)    =  ini  n  -  — 2_r> 

(J(l))2-a  5(1)  (,(l))2-' 

so  if  1-  a/(j(l))2  rounds  to  1,  the  first  rotation  is  the  same  as  in  implicit  zero-shift  QR  and 
we  might  as  well  use  it,  since  it  is  faster  and  more  accurate. 

The  choice  of  tol  may  be  made  by  the  user,  or  chosen  automatically  by  the  program.  If 
tol  is  chosen  close  to  1,  one  almost  always  picks  shifted  QR,  which  still  computes  the  singular 
values  with  good  absolute  accuracy,  so  only  the  smallest  singular  values  wiU  be  inaccurate.  If 
one  chooses  tol  near  e,  one  wiU  almost  always  use  implicit  zero-shift  QR  unless  all  the  singu- 
lar values  are  very  close  together,  and  therefore  sacrifice  the  cubic  convergence  of  shifted 
QR.  See  section  7  for  descriptions  of  numerical  experiments  on  the  effect  of  varying  «. 
Choosing  tol  near  1  is  useful  for  quickly  obtaining  estimates  of  singular  values  for  rank  deter- 
mination. Note  that  as  singular  values  converge  and  are  deflated  off,  a  can  and  should  be 
reestimated  so  that  if  tol  is  not  too  small,  by  the  time  all  the  smaU  singular  values  have  con- 
verged, the  algorithm  is  doing  shifted  QR. 

Note  also  that  if  one  is  only  interested  in  computing  the  smallest  singular  value  or 
values,  a  provides  a  test  for  stopping  the  iteration  early.  If  one  or  several  small  singular 
values  have  been  deflated  out,  and  the  a  for  the  remaining  matrix  exceeds  them  sufficiently, 
one  is  guaranteed  to  have  found  the  smallest  singular  value.  A  similar  idea  is  cxoresscd  in 
[Van  Huffel,  et  al]. 

Finally,  we  consider  computing  the  singular  values  to  guaranteed  absolute  accuracy 
instead  of  guaranteed  relative  accuracy.  As  stated  in  the  introduction,  this  is  what  standard 
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shifted  QR  computes.  However,  the  convergence  criteria  (4.1)  and  (4.2)  in  the  current  stan- 
dard implementation  are  much  more  stringent  than  necessary  to  meet  this  goal.  Instead  of 
comparing  \e{i)\  or  |.r(i)|  to  its  neighbors  to  see  if  it  is  negligible,  it  is  only  necessary  to 
compare  to  a  =  ||B||.  In  other  words  substituting 

if  ( |j,  I  s  tol*a)   set  *,  to  0  (4.3) 

for  (4.1)  and 

if  (|e,|  s  tofW)   set  «,  to  0  (4.4) 

for  (4.2)  will  also  guarantee  absolute  accuracy  but  possibly  speed  convergence  considerably. 
In  practice  the  input  parameter  lot  could  be  used  to  choose  between  absolute  and  relative 
accuracy:  if  tol  is  F>ositive,  it  indicates  that  relative  accuracy  tol  is  desired,  and  if  tol  is  nega- 
tive, it  indicates  that  absolute  accuracy  |(o/|-a  is  desired. 
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5.  Accnrate  Eigenvaloes  of  Some  Classes  of  Symmetric  Tridiagonal  Matrices 

Id  this  section  we  show  how  to  use  the  implicit  zero-shift  QR  algorithm  to  compute 
eigenvalues  of  two  classes  of  symmetric  tridiagonal  matrices  with  high  relative  accuracy. 
These  matrices  share  the  property  that  small  relative  errors  in  the  matrix  entries  cause  small 
relative  errors  in  the  eigenvalues.  The  classes  are  symmetric  tridiagonal  matrices  with  zero 
diagonal,  and  positive  definite  diagonally  dominant  symmetric  tridiagonal  matrices.  We 
define  diagonal  dominance  below;  it  includes  the  possibility  of  graded  matrices. 

First  we  consider  symmetric  tridiagonal  matrices  with  zero  diagonal.  This  class  of 
matrices  was  analyzed  in  Theorem  2,  where  it  was  shown  that  the  eigenvalues  were  deter- 
mined with  nearly  the  same  relative  accuracy  as  the  matrix  entries.  Later  in  section  2  it  was 
shown  how  to  reduce  the  singular  value  problem  for  bidiagonal  matrices  to  the  eigenvalue 
problem  for  symmetric  tridiagonal  matrices  with  zero  diagonal.  The  same  idea  works  back- 
wards: the  eigenvalue  problem  for  symmetric  tridiagonal  matrices  with  zero  diagonal  can  be 
converted  into  a  bidiagonal  singular  value  problem.  When  the  dimension  n  of  the  tridiagonal 
is  even,  the  correspondence  is  immediate  with  the  eigenvalues  of  the  tridiagonal  being  the 
positives  and  negatives  of  singular  values  of  the  corresponding  bidiagonal.  When  n  is  odd, 
one  needs  the  fact  that  the  tridiagonal  will  always  have  (at  least)  one  zero  eigenvalue.  The 
tridiagonal  can  then  be  augmented  with  one  extra  zero  row  and  zero  column  and  then  be  con- 
verted into  a  bidiagonal  singular  value  problem  with  (at  least)  one  zero  singular  value. 

The  second  class  of  symmetric  tridiagonal  matrices  arc  positive  definite  ones  whose 
eigenvalues  are  known  to  be  insensitive  to  small  relative  perturbations  in  each  entry.  If  T  is 
such  a  matrix,  the  algorithm  we  will  propose  wiU  accurately  find  the  eigenvalues  of  a  matrix 
T  +  hT  where  |67"|  s  3e|r|,  and  in  fact  the  diagonal  of  6r  is  zero.  In  other  words  BT  is  a 
small  componentwise  relative  perturbation  of  T.  Since  the  eigenvalues  of  T  are  insensitive  to 
such  perturbations,  the  computed  eigenvalues  will  approximate  them  accurately.  Even  if  the 
eigenvalues  of  T  cannot  be  shown  to  be  insensitive  to  such  small  perturbations,  the  algorithm 
will  still  accurately  compute  the  eigenvalues  of  a  matrix  differing  from  T  in  only  a  few  units 
in  the  last  place  of  each  component;  this  is  nearly  the  strongest  backward  error  bound  possi- 
ble. 

The  symmetric  tridiagonal  matrices  whose  eigenvalues  we  can  prove  are  insensitive  are 
the  diagonally  dominant  ones,  where  we  define  diagonal  dominance  as  follows: 

Derinition:  Let  A  be  a  symmetric  tridiagonal  matrix  with  diagonal  entries  O],  .  .  .  ,a„  and 
offdiagonal  entries  *, ,...,*„_,.  Let  G  be  the  symmetric  tridiagonal  matrix  with  zero  diag- 
onal and  offdiagonal  entries  i>,- |a,-a2  |-"2,  ...  fc,- |a,.fl,  +  ,  j-i'Z  ...  fc„_,-|fl„_,.a„  [-I'Z.  Let 
7=  ||G||.    If  7<1  we  say  A  is  diagonally  dominant  by  the  factor  y. 

Obviously,  if  A  has  positive  diagonal  entries  and  is  diagonaUy  dominant,  it  is  positive 
definite.  The  definition  applies  to  indefinite  matrices  as  well.   We  may  now  state 

Theorem  7:  Let  A  and  A  +BA  be  symmetric  tridiagonal  matrices  where  A  has  positive  diago- 
nal entries  and  is  diagonally  dominant  by  the  factor  -y,  and  |6/1  |  s  t)  |/1  |.  Let  X,a  •  •  •  ax, 
be  the  eigenvalues  of  >4  and  X/a  •  •  •  a:X„'  be  the  eigenvalues  oiA+hA.  If 

Oso,-  2(2n-l)-nfl+8>^) 

(1-7)'(|P-  -y)-  8W 

l-t-t) 
then 

|X/  -  X,|s3w|X,| 
Note:  When  txI,  to  =  2(2n  -  l)T,(l-t- 8-y^)(l-'Y)-*. 

Proof:  Let  D  =  diag(Vfl7 V^),  A  =  D'^Ap-^   and   hA=D-^bAD-K     Note   that 

A  =  I  +  G,   where  G   has   zero   diagonal   and  offdiagonal  entries  fc,(a,(j,  +  ,)""^.   Thus  by 
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Theorem  1,  (I-7)  s  \{A)  s  (l  +  -y).  Also  \hA  \  s  •n(/  +  C). 

Let  LL^= A,  LL^= A,  (L  +  hL]lL  +  hLy=A  +  hA,  and  (L  +  bL)(L  +  bZy = A  +  hAhc  Chok- 
sky  factorizations.  Clearly  L^DL  and  hL  =  DhL.  Our  goal  is  to  show  that  |6i,  |  s  t|Z.|, 
where  t  depends  only  on  7  and  i).  Then  |6L  |  s  t|Z.  |  and  by  Theorem  2  the  singular  values 
a,'  of  L  +  81.  will  differ  from  the  singular  values  a,  of  L  by  at  most 

1  -  (2n-l)T      '    " 

Therefore  their  squares,  the  eigenvalues  X,'  of  A  +M,  will  differ  from  the  eigenvalues  X,  of 
A  by  at  most 


f    2(2n-l)T      ^         {In-lf.^       1  .,     , 
[1-  (2n-l)T         (1-  (2n-l)T)2j 


If  we  assume  u  ■■  2(2ii  -  1)t<1,  this  last  bound  is  in  turn  bounded  by  3(i>. 

We  estimate  t  as  follows.  Let  8Ay  =  ijy'A,;,,   \r\ij\  s  t\.  Using  block  Cholesky  we  can 
write 

L-   =  1  -  rfr-,(A(;.,))-'rf,-,    . 

where  A{i_i)  is  the  leading  i-1  by  ;-l  8ubmatrix_of  A,  and  </,_]  is  a  column_vector  whose 
first  j-2  entries  are  zero  and  whose  last  entry  is  i4,_i_,.  Note  that  L\  a  X„u,(A)  ^  I-7  and 

SO</f-i(A(,-,))"'<f,-i   S  -If. 

Using  block  Cholesky  once  again  we  get 

(L,  +  6L„)2  =  1  +  lAu  -  (l  +  Ti,-,.,)2d/".,((A  +  8A)(,_„)-'j,_,    . 

Now  (M+8W)"'  =  M"'  -  Af"'8AfAf^'(/+_8AfAf~')~'  -_Af"'  +  A,  where 

||A||  s  |lAf|p||8A/||  /  (1-||8A/||-||A/->||).    Then  ((A +  8/1 )(,_,))"'  =  (A,,-,,)"' +  A,-,  where 

1  -  ■n(i+")f)(i-7) 

Now  we  can  write 

(L,i+hL„y  =  1  +  Ti,,  -  (l  +  T,,-i,,)^^?"-i((A(.-i))"'  +  A.-iH-i 

=  Zj,  +  T)„  -  (2Ti,_,,,  +  Ti?-i,,)rfr-i(A(,-i))"''/,-i  -  (l  +  Ti,_i.,)2</r-iA,-,rf,-, 

-=  (r„(i+T„))2 

and  estimate 


K„|  ^  (l-.)-h  +  (2.  +  Vh  +    (l^.)^(1^7)(l-7)-y.] 


1+T) 

as  long  as  •y<(l-'n)(l  +  T|)~'  (the  final  condition  on  7  will  be  more  stringent  than  this). 
Then 

7  ,    .7  _  •^i.i-l    +    8Ai,_i  1   +    T)|,-i  ^i.i-1 


1   +    T,-i.,_i 

We  estimate 


^■.1-1    "    U  +  T/i-lJ-i.,,-! 
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,  I  T1    +     |t.-i,,-i|  Tld+S^') 

1  +  1) 
as  desired.  □ 

For  example,  the  following  matrix  [Parlett,  p.  162] 

1     1 

1    10'    10^ 
i4  =  10^    10«    10' 

10'    10'    10* 
10«    10" 

is  diagonally  dominant  by  the  factor  7=. 0318.  Therefore  relative  perturbations  of  size  ■n<.01 
in  each  entry  will  cause  relative  perturbations  in  the  eigenvalues  of  no  more  than  6411. 

The  algorithm  we  propose  has  three  easy  steps: 
Finding  eigenvalnes  of  the  positive  definite  symmetric  tridiagonal  matrix  A 

1)  Compute  the  Cholesky  factorization  A  =LL^. 

2)  Find  the  singular  values  of  the  bidiagonal  matrix  L  using  the  algorithm  of  section  4. 

3)  Square  the  singular  values  of  L  to  get  the  eigenvalues  of  A. 

The  following  theorem,  whose  proof  is  given  in  section  8,  bounds  the  backward  error  in 
step  1): 

Theorem  8:  Let  A  be  an  n  by  n  positive  definite  symmetric  tridiagonal  matrix  with  diagonal 
entries  aj,  .  .  .  ,a„  and  off  diagonal  entries  fcj,  .  .  .  ,fc„-i.  Let  L  be  the  computed  Cholesky 
factor  from  the  following  algorithm: 

'11  =  '^'''(fli) 
for  ('  =  1  to  n  —  1 

'1  +  1. i   -    *i/'ii 

'.  +  1..  +  1  =  *9rf(fl,  +  i-/?+i.,) 
endfor 

Then  barring__  over/underflow  and  ^attempts  to  take  square  roots  of  negative  arguments, 
L  =  L  +  hL,  \hL\  s  5.5e  |L  I,  where  L  is  the  exact  Cholesky  factor  of  the  matrix  A  +hA=LL  , 
where  \hA  |  s  3t  |A  |,  and  the  diagonal  of  8A  is  zero. 

Since  we  can  find  all  the  singular  values  of  the  computed  L  accurately,  this  algorithm 
can  accurately  find  all  the  eigenvalues  of  a  matrix  which  differs  from  the  original  A  only  by  3 
units  in  the  last  place  of  each  offdiagonal  entry.  This  is  independent  of  the  sensitivity  of  the 
eigenvalues  of  A.  If  i4  is  diagonally  dominant  so  that  Theorem  7  applies,  the  computed  eigen- 
values will  also  be  accurate  eigenvalues  of  A. 

A  slightly  different  example  is  the  standard  second  centered  difference  matrix  A  which 
has  2's  on  the  diagonal  and  -I's  on  the  first  offdiagonals.  The  entries  of  A's  Cholesky  factor  L 
are  given  explicitly  by  /.„=  (1+  i~')"^  and  Lj  ,  +  ]  =  (i7(l+  «))"^.  so  A's  eigenvalues  may  also  be 
computed  accurately  by  this  method.  In  cases  like  this  one  it  may  be  difficult  to  determine  a 
good  bound  on  the  diagonal  dominance  beforehand,  but  after  Cholesky  one  may  simply 
examine  the  squared  diagonal  entries  l},  to  see  bow  much  smaller  than  a,  they  are;  if  l},/a,  is 
sufficiently  larger  than  €  for  all  i,  it  is  easy  to  see  that  all  the  /^  will  have  been  computed 
accurately,  so  the  computed  singular  values  will  be  accurate  as  well. 
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A  theorem  analogous  to  Theorem  7  is  true  for  indefinite  diagonally  dominant  symmetric 
tridiagonal  matrices:  high  relative  accuracy  in  the  nonzero  entries  implies  high  relative  accu- 
racy in  all  eigenvalues  [Kahan,  p54].  Unfortunately,  the  trick  with  Cholcsky  does  not  work 
on  indefinite  matrices.  In  order  to  accurately  compute  the  tiny  eigenvalues  of  these  matrices, 
we  will  need  to  use  the  algorithm  of  the  next  section. 
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6.  Accnrate  Singular  Values  from  Sylvester's  Law  of  Inertia 

In  this  section  we  give  a  second  metliod  for  computing  the  singular  values  of  a  bidiago- 
nal  matrix  B  to  high  relative  accuracy.  It  is  not  competitive  in  speed  with  QR  on  a  serial 
machine,  but  can  efficiently  verify  whether  a  computed  singular  value  is  accurate  or  not.  We 
have  used  it  to  verify  the  numerical  results  presented  in  section  7. 

The  algorithm  is  based  on  Sylvester's  Law  of  Inertia,  or  equivalently,  Sturm  sequences 
[Parlett,  p.  52].  As  explained  in  section  2,  the  number  of  negative  d,  in  recurrence  (2.1)  is 
the  number  of  eigenvalues  less  than  x,  a  quantity  we  will  denote  by  v{x).  v{x2)  —  v(xi)  is 
therefore  the  number  of  eigenvalues  in  the  interval  [xi.xi),  so  this  method  lets  us  easily  ver- 
ify whether  an  interval  contains  any  eigenvalues.  The  identification  of  the  singular  value 
problem  for  the  bidiagonal  matrix  B  with  the  eigenvalue  problem  for  a  symmetric  tridiagonal 
with  zero  diagonal  later  in  section  2  makes  it  clear  that  we  can  use  the  same  method  to  count 
the  number  of  singular  values  in  any  interval  [x|,Z2)- 

What  remains  is  an  error  analysis  to  show  that  the  function  v{x)  is  accurate.  This  is  pro- 
vided by  an  analysis  referred  in  [Kahan,  p. 35]: 

Let  v(x)  be  the  computed  number  of  eigenvalues  less  than  x  for  a  symmetric  tridiagonal 
matrix  A.  Barring  over/underflow,  the  computed  value  of  v(x)  is  the  exact  value  of 
v(x)  for  a  the  perturbed  matrix  A+hA  where  \bA\  s  2*  |offdiag(/i)  |  +  ex/.  Here, 
offdiag(A)  refers  to  the  off  diagonal  part  of  A.  If  A  has  a  zero  diagonal,  this  bound  may 
be  improved  to  \bA  |  s  1.5-e  |i4  |. 

Therefore,  by  Theorem  2  or  Corollary  2,  if  the  computed  value  of  v(x)  is  *,  there  must 
be  at  least  k  singular  values  of  B  less  than  x/(l-(3n  - 1.5)€)  and  no  more  than  k  singular 
values  less  than  x(l-(6n -2)e)/(l- (3n  -  1.5)e).  If  the  computed  value  of  v(x2)-v(xi)  is  y, 
there  must  be  at  least  j  singular  values  in  the  interval 
[x,(l-(6n-2)«)/(l-(3n-1.5)£  ,  x^/Cl- (3n  -  1.5)e)). 

Combined  with  the  perturbation  theory  of  the  last  section,  this  error  analysis  shows  that 
we  may  use  Sylvester's  theorem  to  compute  all  eigenvalues  of  diagonally  dominant  symmetric 
tridiagonal  matrices  to  nearly  full  relative  precision,  since  for  x  near  an  eigenvalue  X,,  the 
perturbation  ex/  can  only  cause  a  small  relative  error  in  X,.  Such  an  approach  parallelizes 
easily,  as  shown  in  [Lo  et  al.]. 

There  is  one  other  important  feature  of  the  computed  v(x)  to  discuss.  In  exact  arith- 
metic, since  v(x)  is  the  number  of  eigenvalues  less  than  x,  v(x)  must  be  a  monotonic  increas- 
ing function  of  x.  It  is  by  no  means  clear  that  the  computed  values  of  v(x)  should  also  be 
monotonic.  This  is  significant  because  a  failure  in  monotonicity  could  cause  an  algorithm  to 
misestimate  the  number  of  eigenvalues  in  an  interval,  although  a  bisection  routine  which 
begins  with  an  interval  [xi,X2)  where  v(x2)-v(x,)  is  positive  can  always  maintain  an  interval 
over  which  the  computed  value  of  v  increases.  It  turns  out,  however,  that  as  long  as  the  arith- 
metic is  monotonic,  the  computed  value  of  v(x)  will  be  monotonic  [Kahan,  p.  27].  By  mono- 
tonic arithmetic  we  mean  that  if  a'b^cd  in  exact  arithmetic,  then  /l(aob)^fl(c<>d)  as  well. 
This  holds  in  any  well-designed  arithmetic,  such  as  the  IEEE  Floating  Point  Standard  754 
[IEEE].  We  have  only  shown  monotonicity  holds  if  the  recurrence  is  computed  exactly  as  fol- 
lows, with  the  order  of  evaluation  respecting  parentheses: 
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7.  Namerical  experimenU 

The  numerical  experiments  we  discuss  here  compare  the  hybrid  algorithm  of  section  5 
with  the  LINPACK  SVD  [LINPACK].  Both  codes  were  run  in  double  precision  on  a  SUN  3 
with  a  MC68881  floating  point  coprocessor,  which  implements  IEEE  standard  754  floating 
point  arithmetic;  the  machine  precision  c  =  2"*'  =  1.110~"  and  the  range  is  approximately 

The  codes  were  compared  with  respect  to 

accuracy, 

total  number  of  inner  loops, 

elapsed  time  when  computing  singular  values  only, 

elapsed  time  when  computing  both  left  and  right  singular  vectors  as  well, 

elapsed  time  including  bidiagonalizing  the  input  matrix,  and 

elapsed  time  excluding  bidiagonalizing  the  input  matrix. 

Also,  the  dependence  of  the  new  algorithm  on  the  parameter  tol  (see  section  4)  was  investi- 
gated. At  the  end  we  comment  on  the  implications  of  our  results  for  the  "perfect  shift'  stra- 
tegy for  computing  singular  vectors. 

It  turns  out  that  the  results  depend  strongly  on  the  form  of  the  bidiagonal  matrix.  For 
example,  the  standard  SVD  behaves  entirely  differently  on  matrices  graded  from  top  to  bot- 
tom than  on  matrices  graded  in  the  opposite  direction.  Therefore,  we  present  our  results  on 
12  separate  classes  of  bidiagonal  matrices,  since  this  seems  to  be  the  only  fair  way  to  com- 
pare results.  The  classes  are  as  follows: 

Class  1: 

These  8  matrices  are  graded  in  the  usual  way  from  large  at  the  upper  left  to  small  at  the 
lower  right.  All  matrices  have  a  1  in  the  upper  corner,  and  each  superdiagonal  entry 
Bi_i  +  \  equals  its  neighbor  £,,  on  the  diagonal.  Four  of  the  matrices  are  10  by  10  and 
have  a  constant  multiple  between  adjacent  entries  on  the  diagonal  and  superdiagonal: 
10'°,  10^  10-  and  10.  The  other  four  are  20  by  20  and  are  obtained  from  the  first  four 
by  simply  repeating  each  entry  once,  e.g.  a  diagonal  containing  1,  10~'°,  10"^°,  ...  , 
10"'°  becomes  1,  1.  lO"'",  10"'°,  10-^°,  10-^°,  ...  ,  10"'°,  10-'°. 

Class  2: 

This  class  is  identical  to  class  1  except  the  order  of  the  entries  on  the  diagonal  and 
superdiagonal  are  reversed.  Thus  these  matrices  are  graded  from  small  at  the  upper  left 
comer  to  large  at  the  lower  right. 

Class  3: 

These  eight  20  by  20  and  40  by  40  matrices  are  obtained  by  abutting  those  in  class  1 
with  their  reversals  in  class  2.  Thus  each  matrix  is  small  at  the  upper  left,  large  in  the 
middle,  and  small  again  at  the  lower  right. 

Class  4: 

These  eight  20  by  20  and  40  by  40  matrices  are  obtained  by  abutting  those  in  class  2 
with  their  reversals  in  class  1.  Thus  each  matrix  is  large  at  the  upper  left,  small  in  the 
middle,  and  large  again  at  the  lower  right. 

Class  5: 

These  8  matrices  are  obtained  from  class  1  by  reversing  the  order  of  the  superdiagonals. 
Thus  the  diagonal  is  graded  from  large  at  the  upper  left  to  small  at  the  lower  right,  and 
the  superdiagonal  is  graded  in  the  opposite  direction. 

Class  6: 

These  8  matrices  are  obtained  from  class  5  by  reversing  the  order  of  both  the  diagonals 
and  superdiagonals.  Thus  the  diagonal  is  graded  from  small  at  the  upper  left  to  large  at 
the  lower  right,  and  the  superdiagonal  is  graded  in  the  opposite  direction. 
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Class  7: 

These  16  matrices  are  all  small  on  the  diagonal  and  mostly  large  on  the  offdiagonal. 
Eight  of  them  are  10  by  10  with  I's  on  the  off  diagonal  and  a  constant  diagonal,  equal- 
ing 10"^  10'",  10"',  10-^  10"'°,  10"'^,  10"'*,  and  10"",  respectively.  The  other 
eight  20  by  20  matrices  are  obtained  by  putting  two  copies  of  each  of  the  first  eight 
together,  and  "connecting"  them  by  setting  the  middle  offdiagonal  entry  flio.n  to  be 
10"'*  times  the  value  of  the  diagonal  entries. 

Classes  8-11: 

The  ten  20  by  20  matrices  in  each  class  arc  generated  by  letting  each  bidiagonal  entry  be 
a  random  number  of  the  form  r-10',  where  r  is  a  random  number  uniformly  distributed 
between  -.5  and  .5,  and  t  is  a  random  integer.  In  class  8,  i  is  uniformly  distributed 
from  0  to  -15.  In  class  9,  i  is  uniformly  distributed  from  0  to  -10.  In  class  10,  i  is 
uniformly  distributed  from  0  to  -5.  In  class  11,  /  is  identically  0.  Thus,  in  class  11  each 
matrix  entry  is  simply  uniformly  distributed  on  [-  .S,.S]. 

Class  12: 

This  one  41  by  41  matrix  is  graded  in  as  in  class  1,  with  the  ratio  of  adjacent  entries 
being  10"  '  =  .79.  Each  offdiagonal  entry  is  identical  to  the  diagonal  entry  below  it.  This 
very  dense  grading  leads  to  different  convergence  properties  than  for  the  matrices  in 
class  1,  which  is  why  we  put  this  example  in  a  separate  class. 

Thus  classes  1-6  and  12  consist  of  graded  matrices,  class  7  consists  of  matrices  larger  on 
the  offdiagonal  than  the  diagonal,  and  classes  8-11  consist  of  random  matrices  with  random 
exponents. 

First  we  discuss  the  accuracy  of  computed  singular  values.  With  to/=10""  the  new 
algorithm  computed  all  singular  values  to  nearly  fuU  accuracy.  Accuracy  was  determined 
using  the  method  in  Section  6:  If  cr  is  a  computed  singular  value,  the  number  of  singular 
values  in  the  interval  [CT(l-n€)  ,  a{l  +  ni))  were  counted.  Overlapping  intervals  were  joined 
into  larger  intervals.  The  number  of  computed  singular  values  in  each  interval  was  then  com- 
pared with  the  true  number  of  singular  values  in  each  interval.  This  accuracy  test  was  passed 
in  all  cases,  meaning  at  least  about  14  digits  were  guaranteed  correct  in  all  computed  singular 
values. 

The  accuracy  of  the  singular  values  computed  by  the  LINPACK  SVD  were  determined 
by  comparison  with  the  singular  values  from  the  new  algorithm.  This  data  is  presented  in 
Table  1.  We  called  agreement  to  at  least  14  digits  with  the  verified  correct  results  of  the  new 
algorithm  "all  digits  correct";  the  notation  "%  all  digits"  in  Table  1  means  the  percentage  of 
such  singular  values.  The  notation  "%  m-n  digits"  in  Table  1  means  the  percentage  of 
singular  values  computed  with  m  to  n  correct  digits.  0  digits  means  that  the  order  of  magni- 
tude is  still  correct.  - 1  digits  means  correct  to  within  a  factor  of  10.  The  column  "% 
nonzero,  no  digits"  gives  the  percentage  of  computed  singular  values  which  were  nonzero  and 
bad  incorrect  orders  of  magnitude.  The  column  '%  zero,  no  digits"  gives  the  percentage  of 
computed  singular  values  which  were  exactly  zero,  even  though  the  matrix  was  nonsmgular. 
The  *  in  row  4  indicates  that  the  algorithm  did  not  converge  for  one  of  the  test  matrices  (this 
matrix  was  not  counted  In  computing  the  percentages). 
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Table  1:  Accuracy  of  Singular  Values  from  LENPACK  SVD 

Class 

%  all 

%  12-14 

%  8-12 

%  4-8 

%0-4 

%  -1 

%  nonzero, 

%  zero. 

digits 

digits 

digits 

digits 

digits 

digits 

no  digits 

no  digits 

1 

100 

0 

0 

0 

0 

0 

0 

0 

2 

42 

14 

11 

0 

29 

1 

0 

3 

3 

99.5 

.5 

0 

0 

0 

0 

0 

0 

4* 

57 

12 

5 

1.5 

20.5 

1.5 

0 

2.5 

5 

94 

0 

0 

0 

0 

0 

0 

6 

6 

94 

0 

0 

0 

0 

0 

0 

6 

7 

91 

0 

.5 

0 

.5 

0 

1 

7 

8 

80.5 

4.5 

5 

3 

1 

.5 

.5 

5 

9 

80 

6.5 

7 

2.5 

1 

0 

0 

3 

10 

91 

4.5 

3 

1.5 

0 

0 

0 

0 

11 

98 

2 

0 

0 

0 

0 

0 

0 

12 

100 

0 

0 

0 

0 

0 

0 

0 

One  striking  feature  about  this  table  is  the  difference  between  classes  1  and  2.  The  only 
difference  between  the  matrices  in  classes  1  and  2  is  the  order  of  the  entries.  When  the 
entries  are  graded  from  large  to  small,  the  standard  SVD  gets  all  the  singular  values  correct. 
When  they  are  graded  in  the  opposite  way,  only  42%  are  fully  correct  and  another  third  have 
fewer  than  4  digits  correct.  3%  are  computed  as  0  even  though  all  matrices  tested  were  non- 
singular.  This  happens  because  the  standard  SVD  always  "chases  the  bulge"  from  top  to  bot- 
tom. When  the  matrix  is  graded  from  large  to  small,  this  works  well,  but  when  it  is  graded  in 
the  opposite  way  as  in  class  2,  the  algorithm  must  "reorder"  all  the  matrix  entries,  and  in 
doing  so  must  combine  tiny  entries  with  large  entries,  thereby  losing  precision.  The  same 
thing  happens  for  class  4.  The  new  algorithm  avoids  the  need  to  reorder  by  always  "chasing 
the  bulge"  from  the  large  to  the  small  end  of  the  matrix.  This  is  also  done  in  the  algorithm  in 
[Van  Huffel,  et  al];  see  section  9  for  details.  The  three  nonzero  singular  values  which  are  not 
even  order  of  magnitude  correct  are  off  by  factors  of  10"**  and  10""  (class  7)  and  10"' 
(class  8).  The  last  column  indicates  how  often  the  computed  singular  values  were  exactly 
zero,  when  in  fact  none  of  the  test  matrices  were  singular. 

We  evaluated  the  computed  singular  vectors  by  computing  the  norm  of  the  residual 
BV-t/2,  where  B  is  the  bidiagonal  matrix,  V  contains  the  right  singular  vectors,  U  the  left 
singular  vectors,  and  2  the  singular  values.  The  norm  was  the  maximum  absolute  matrix 
entry.  In  all  cases  for  both  new  and  old  SVD  this  measure  never  exceed  about  310"'^,  or 
30e,  which  is  quite  good  and  as  expected  from  both  algorithms  (it  is  easy  to  show  the  conver- 
gence criteria  for  both  algorithms  leave  the  residual  near  the  roundoff  level).  We  do  not  yet 
have  any  perturbation  theory  or  better  accuracy  tests  for  the  singular  vectors;  see  section  10 
for  further  discussion. 

Table  2  gives  timing  comparisons  between  the  old  and  new  algorithms.  There  were 
several  statistics  collected.  First,  the  number  of  inner  loops  for  each  algorithm  was  counted, 
and  the  ratio  of  inner  loops  for  the  new  algorithm  to  inner  loops  for  the  old  algorithm  com- 
puted; these  statistics  (minimum,  average  and  maximum  ratios,  the  same  for  the  other  statis- 
tics) are  shown  in  columns  2-4  of  Table  2.  The  timings  depend  on  whether  singular  vectors 
are  computed  (Job=  v  in  Table  2)  or  not  (Job  =  nv).  They  also  depend  on  whether  we  count 
the  time  to  bidiagonalize  or  not.  The  time  to  bidiagonalize  is  quite  large  and  can  swamp  the 
second,  iterative  part.  Therefore  we  computed  timing  ratios  (new  algorithm  to  old  algorithm) 
both  with  and  without  the  initial  bidiagonalization  (when  bidiagonalizing  we  performed  the 
entire  algorithm,  not  permitting  the  algorithm  to  recognize  it  was  dealing  with  a  bidiagonal 
input   matrix   and   bypassing   some   of  the   work).     Columns   6-8   of   Table   2   include   the 
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bidiagonalizatioD  phase,  and  columns  9-11  exclude  it. 


Table  2:  Timing  Comparisons  of  Old  and  New  SVD  Algorithms  with  tol=\0   '*              1 

Class 

Ratio  of  Inner  Loops 

New  SVD  /  Old  Svd 
Min       Avg        Max 

Job 

Ratio  of  Times 

(with  bidiagonalization) 

New  SVD  /  Old  SVD 

Min        Avg          Max 

Ratio  of  Times 
(without  bidiagonalization) 

New  SVD  /  Old  SVD 
Min          Avg             Max 

1 

.20         .53          .95 

nv 

V 

.62          .68            .77 
.50          .67            .88 

.27            .45               .67 
.22            .50               .84 

2 

.13          .27          .41 

nv 

V 

.32          .47            .69 
.28          .42            .57 

.15            .26               .35 
.13            .27               .37 

3 

.99       1.02        1.05 

nv 

V 

.83          .85            .88 
.94          .96          1.00 

.67            .72               .75 
.90            .95             1.00 

4 

.47         .60          .83 

nv 

V 

.59          .71            .89 
.60          .68            .88 

.38            .49               .69 
.46            .58               .80 

5 

1.20       1.71        2.07 

nv 

V 

1.08  1.15           1.22 

1.09  1.24          1.43 

1.13          1.73             2.64 
1.34          1.56             1.86 

6 

1.20       1.78        2.07 

nv 

V 

1.14        1.17           1.22 
1.06        1.27          1.53 

1.32          1.70             2.49 
1.27          1.56             1.77 

7 

.66         .90        1.11 

nv 

V 

.72          .88          1.15 
.73          .91           1.18 

.61            .95             2.48 
.65            .97             1.89 

8 

.48       1.02        1.78 

nv 

V 

.68          .89          1.11 
.66          .94          1.30 

.46            .81             1.28 
.50            .94             1.55 

9 

.55         .93        1.13 

nv 

V 

.65          .83            .92 
.64          .90          1.04 

.46            .71                .85 
.53            .87             1.05 

10 

.76       1.11         1.45 

nv 

V 

.69          .87          1.00 
.75         1.01           1.26 

.56            .81              1.02 
.69          1.02             1.34 

11 

.86       1.02        1.20 

nv 

V 

.77          .86            .97 
.85          .97          1.11 

.72            .82               .95 
.82            .96             1.14 

12 

2.12       2.12        2.12 

nv 

V 

1.45         1.45          1.45 
1.79        1.79          1.79 

1.77          1.77             1.77 
2.05          2.05             2.05 

Whenever  a  number  less  than  1  appears  in  the  table,  it  means  the  new  algorithm  was 
faster,  and  numbers  greater  than  1  indicate  the  old  algorithm  is  faster.  An  examination  of  the 
table  shows  that  on  the  whole  the  performance  of  the  two  algorithms  is  comparable,  the  new 
algorithm  varying  from  over  3.6  time  faster  (class  2,  counting  bidiagonalization)  to  1.8  times 
slower  (class  12,  counting  bidiagonalization).  Not  counting  bidiagonalization  the  extremes  are 
7.7  times  faster  to  2.6  times  slower;  the  extra  overhead  of  bidiagonalization  moderates  the 
extremes.  On  simply  graded  matrices  (classes  1-4),  the  new  algorithm  did  significantly  better 
than  the  old  overall.  With  the  diagonal  and  offdiagonal  being  graded  differently  (classes  5-6), 
they  were  more  evenly  matched.  In  these  classes  the  largest  ratios  occurred  in  examples 
where  convergence  was  very  fast  with  both  algorithms,  the  old  SVD's  faster  convergence  cri- 
terion winning  out  over  the  new  algorithm's  more  careful  but  more  expensive  convergence 
test.  In  class  7  and  many  examples  in  classes  5-6  there  were  generally  a  few  very  small 
singular  values  and  the  rest  large  and  evenly  spaced  over  a  range  of  at  most  a  few  factors  of 
10;  the  new  algorithm  deflated  out  the  smallest  singular  values  after  1  or  2  sweeps  and  spend 
the  rest  of  the  time  working  on  the  closely  spaced  singular  values.  It  appears  our  criterion  for 
choosing   between  zero   and   nonzero  shift  chooses  the  zero  shift  quite  often,  sometimes 
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sacrificing  cubic  convergence  until  many  singular  values  have  been  deflated.  The  single 
matrix  in  Class  12  was  therefore  chosen  with  very  closely  spaced  singular  values  in  order  to 
make  the  new  algorithm  perform  as  poorly  as  possible;  in  this  example  the  average  number 
of  (zero-shift)  QR  sweeps  per  singular  value  was  4  for  the  new  algorithm,  whereas  the  aver- 
age number  of  (shifted)  QR  sweeps  per  singular  value  was  less  than  2  for  the  old  algorithm, 
which  still  computed  them  aU  correctly.  We  are  not  currently  able  to  find  another  criterion 
permitting  more  frequent  nonzero  shifts  while  still  guaranteeing  high  relative  accuracy. 
Nonzero  shifts  for  fairly  small  singular  values  frequently  do  not  cause  inaccuracy  in  practice 
because  small  rotation  angles  prevent  mixing  large  and  small  magnitude  matrix  entries; 
unfortunately  this  phenomenon  seems  hard  to  exploit  systematically. 

We  next  present  some  timings  for  our  algorithm  with  tol=10~^.  This  low  accuracy 
requirement  speeds  up  the  algorithm  while  still  providing  ordcr-of-magnitude  correct  singular 
values;  thus  it  may  be  of  use  for  rank  determination.  Only  "Job=nv"  (singular  values  only) 
cases  were  run.  On  the  average  the  new  algorithm  is  always  faster  than  the  old.  Again  there 
are  some  cases  in  classes  5-7  where  both  algorithms  converge  very  fast,  but  the  slower  con- 
vergence test  in  the  new  algorithm  makes  it  appear  much  slower  in  some  cases.  In  all  cases 
the  computed  singular  values  were  good  to  at  least  2  figures  as  expected. 


Table  3: 

Timing  Comparisons  of  Old  and  New  SVD 

Aleorithms  with  fo/=  10  ^ 

Class 

Ratio  of  Inner  Loops 

Ratio  of  Times 

Ratio  of  Times 

(with  bidiagonalization) 

(without  bidiagonalization) 

New 

SVD  /  Old  Svd 

New  SVD  /  Old  SVD 

New  SVD  /  Old  SVD 

Min 

Avg         Max 

Min        Avg          Max 

Min          Avg             Max 

1 

.14 

.19            .27 

.43          .55              .62 

.19            .24                 .32 

2 

.05 

.12            .25 

.23          .40             .68 

.08            .16                 .33 

3 

.51 

.70           .85 

.68          .78              .85 

.49            .59                 .70 

4 

.21 

.39           .76 

.44          .63              .87 

.24            .36                 .64 

5 

.10 

.44         1.20 

.47          .83           1.15 

.19            .92              2.19 

6 

.12 

.45         1.20 

.52          .84           1.14 

.22            .87              2.01 

7 

.08 

.22         1.11 

.28          .50           1.16 

.11            .41               2.46 

8 

.12 

.21            .37 

.53          .66             .81 

.21            .32                 .49 

9 

.08 

.18           .30 

.45          .55              .61 

.15            .24                 .33 

10 

.06 

.16            .25 

.42          .47             .55 

.12            .21                 .29 

11 

.32 

.40           .45 

.48          .51              .55 

.30            .36                 .42 

12 

.27 

.27           .27 

.52          .52              .52 

.29            .29                 .29 

As  mentioned  at  the  end  of  section  4  on  convergence  criteria,  we  may  use  the  much  less 
stringent  criteria  (4.3)  and  (4.4)  if  only  absolute  accuracy  rather  than  relative  accuracy  in  the 
singular  values  is  desired.  In  Table  4  we  show  timing  comparisons  between  LINPACK  SVD 
and  the  new  algorithm  where  each  singular  value  is  computed  to  an  absolute  accuracy  of 
to/||A||  =  10"'*-||A|1.   The  format  is  the  same  as  in  Table  2. 

As  can  be  seen  from  Table  4,  the  absolute  convergence  criterion  almost  always  leads  to 
faster  convergence  than  the  LINPACK  SVD.  In  fact,  that  it  is  not  uniformly  faster  is  some- 
what surprising.  Those  occasions  when  the  new  algorithm  takes  more  rotations  than  LIN- 
PACK seem  to  occur  because  our  decision  algorithm  for  deciding  whether  to  chase  the  bulge 
from  top  to  bottom  or  bottom  to  top  makes  a  bad  decision  (we  always  chase  from  the  large 
end  to  the  small  end).  These  bad  decisions  only  occur  when  the  matrix  is  not  graded  in  an 
obvious  way  (classes  5,  6,  10  and  11).  In  no  case  did  the  new  algorithm  take  more  than  2.25 
QR  sweeps  per  singular  value  on  the  average.  More  work  is  needed  to  understand  this 
phenomenon. 
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Tabic  4:  Timine  Comparisons  with  Absolute  Accuracy  10   ^^'W^]] 

Class 

Ratio  of  Inner  Loops 

New  SVD  /  Old  Svd 
Min       Avg        Max 

Job 

Ratio  of  Times 
(with  bidiagonalization) 

New  SVD  /  Old  SVD 
Min        Avg          Max 

Ratio  of  Times 
(without  bidiagonalization) 

New  SVD  /  Old  SVD 
Min          Avg             Max 

1 

.04          .31          .74 

nv 

V 

.49          .59              .73 
.40          .54              .76 

.10            .29                 .62 
.08            .32                 .69 

2 

.04          .14          .32 

nv 

V 

.28          .41              .59 
.22          .34              .45 

.06            .15                 .28 
.05            .15                 .30 

3 

.07          .41           .84 

nv 

V 

.57          .67              .80 
.39          .60              .84 

.12            .35                 .66 
.10            .40                .79 

4 

.03          .27          .86 

nv 

V 

.46          .60             .77 
.30          .49              .84 

.07            .25                 .66 
.05            .27                 .81 

5 

.20          .58         1.19 

nv 

V 

.72          .91            1.01 
.70          .89            1.10 

.38            .70              1.02 
.31            .65               1.13 

6 

.20          .63         1.45 

nv 

V 

.72          .93            1.12 
.70          .91            1.22 

.37            .71               1.19 
.29            .68               1.33 

7 

.72          .90        1.00 

nv 

V 

.74          .84            1.03 
.78          .90            1.09 

.62            .79               1.22 
.69            .90              1.45 

8 

.29          .49          .84 

nv 

V 

.56          .73              .88 
.48          .66              .87 

.26            .45                 .74 
.28            .47                 .78 

9 

.30          .63           .78 

nv 

V 

.53          .72             .83 
.45          .71              .83 

.28            .53                 .69 
.30            .60                 .76 

10 

.56          .85         1.19 

nv 

V 

.63          .79              .96 
.61           .84            1.09 

.47            .69                 .93 
.52            .80              1.11 

11 

.93        1.04         1.12 

nv 

V 

.80          .85              .90 
.90          .98            1.04 

.74            .81                 .87 
.88            .97               1.05 

12 

.83          .83          .83 

nv 

V 

.75          .75              .75 
.83          .83              .83 

.64            .64                .64 

.78            .78                 .78 

Finally,  we  discuss  the  implications  of  our  results  for  the  "perfect  shift"  strategy  for 
computing  singular  vectors  (or  eigenvectors).  This  strategy  advocates  computing  the  singular 
values  (or  eigenvalues)  by  the  quickest  available  method  without  accumulating  singulu  vec- 
tors, and  then  using  these  computed  singular  values  as  "perfect  shifts"  in  the  QR  iteration  to 
compute  the  singular  vectors  in  1  or  possibly  2  QR  sweeps.  The  hope  is  that  by  avoiding  the 
work  of  accumulating  vectors  while  converging  to  accurate  singular  values,  time  will  be  saved 
by  computing  the  singular  vectors  afterwards  in  one  or  two  sweeps  each.  Unfortunately,  our 
numerical  results  indicate  this  approach  will  not  work  in  general.  For  when  our  hybrid  algo- 
rithm chooses  to  do  an  implicit  zero  shift,  it  is  in  fact  doing  a  perfect  shift  within  the  limits  of 
roundoff  error.  Depending  on  the  distribution  of  singular  values,  this  can  take  more  or  less 
time  to  converge.  Therefore  one  cannot  assume  1  or  2  sweeps  with  the  "perfect  shift"  will 
results  in  converged  singular  vectors,  and  we  could  well  end  up  doing  as  many  sweeps  to 
compute  the  singular  vectors  as  the  singular  values.  This  will  not  happen  in  general,  and  a 
clever  algorithm  might  be  able  to  decide  when  perfect  shifts  are  useful  and  then  use  them, 
perhaps  by  keeping  track  of  which  deflated  subblocks  of  the  matrix  do  not  require  zero  shifts 
and  using  the  perfect  shift  strategy  on  them. 
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8.  Detailed  Error  Analysis 

In  this  section  we  present  detailed  error  analyses  of  the  implicit  zero-shift  QR  algorithm 
(Theorems  5  and  6)  and  Cholesky  factorization  (Theorem  8).  We  begin  with  the  assump- 
tions used  in  the  error  analyses.  Our  model  of  error  in  floating  point  arithmetic  was  given 
above  in  (3.5).  It  implies  that  overflow  and  underflow  do  not  occur;  we  will  discuss  suscepti- 
bility to  overflow  and  underflow  below.  Our  analysis  will  be  linearized  in  the  sense  that  we 
wiU  replace  quantities  like  (H-ie)*(l  +  ^e)  by  l  +  (i+J)€  and  il+i€)/{l+je)  by  l-t-(i-y)«; 
such  approximations  can  be  made  rigorous  by  assuming  all  terms  of  the  form  te  arc  less  than 
.1  in  magnitude  and  increasing  the  final  error  bound  by  a  factor  1.06  [Wilkinson,  p.  113]. 

Lemma  5:  Let  cosG,  sine  and  p  denote  the  exact  outputs  of  ROT  for  inputs  /and  g  and  exact 
arithmetic.  Now  consider  the  floating  point  version  of  ROT  applied  to  the  perturbed  inputs 
/  =  /(!+€/)  and  i  =  *(l  +  «,),  and  let  «  =  (l-(-€„)cose,  *ii  =  (H-c„)sine,  and 
r  =  (l-t-€r)p  denote  the  computed  results,  where  we  assume  neither  overflow  nor  underflow 
occurs.  Then  we  may  estimate  the  relative  errors  «„,  €„  and  «,  as  follows: 

21 
e„  =  (€/-€^)sin^e  -I-  «„'    .where     |ec/|  s  — e 

21 
e„  =  (€j-€/)cos2e  4-  €„'    .where     |e„'|^-^« 

13 
€,  =  tjSin^e  +  e/cos^e  +  e/    .where     Ic/ |  s  -— e 

Proof:  We  only  consider  the  case  |/|>  |||;  the  other  case  is  analogous.  In  the  following  es 
with  numeric  subscripts  indicate  independent  quantities  bounded  in  magnitude  by  e.^^Then 
applying  (3.5)  systematically  to  the  expressions  in  ROT,  and  using  the  fact  that  |i//|<l, 
yields 

I  =    ^  (l+C,-C/+€,) 

«  =  (l  +  e4)[(l  +  .3)(l  +  ''(l  +  «2))]"^ 

=  (l-^7€5/4)[l-^r2]i'2 

=    (l  +  7€5/4)[l+(j//)2(l    +    2(€,- €/+«,))]"" 

=  (l  +  7€5/4)(l  +  (€,-t^t,)(«//)'/(l+(/?//)'))[l+(«//)'l"' 

=  (1  -I-  9t6/4  +  (e4-€/)sin^e)sece 

cs  =  (l-l-e7)/«  =  (1  +  13eg/4  -f-  (€/-e,)sin^e)cose 

sn  =  {l+€g)ics  =  (1  +  2l€io/4  -t-  (tj-e/)cos^e)sine 

r  =  (H-tii)/"  =  (1  +  13ti2/4  -I-  e^sin^e  -I-  e^s^e)p 

D 

To  analyze  the  errors  in  the  implicit  zero-shift  QR  algorithm,  we  need  to  investigate 
how  the  errors  accumulate  from  one  pass  through  the  loop  to  the  next.  It  turns  out  the  errors 
in  /and  oldcs  are  the  essential  ones: 

Lemma  6:  Let/,  and  otdcs,  denote  the  true  values  of /and  oldcs  at  the  entry  to  the  i-th  itera- 
tion of  the  loop  in  the  implicit  zero-shift  QR  algorithm.  Let  flj,  and  62,  be  the  true  values  of 
the  two  rotation  angles  in  the  i-th  iteration  of  the  loop.    In  other  words,  /,,  oldcs,,  61,  and  62, 
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are  the  values  that  would  have  been  computed  had  all  arithmetic  been  exact.  Let/,(l  +  c/,) 
and  oldcs,('y  +  €oUcii)  denote  the  actual  floating  point  values  of  /  and  oldcs  at  the  top  of  the 
loop,  with  all  previous  loop  iterations  having  been  done  in  floating  point  without  any  over- 
flows or  underflows.  Then 


l«//+ll 

£ 

sin^e,, 

0 

IvJ 

+ 

*oWoi+l  1 

2cos^ei,-sin^e2, 

sin^ej, 

l«oW«/l 

25€  /4 
21e  /4  +  21e-sin^e2,  /2 


(8.1) 


In  terms  of  these  expressions,  we  can  bound  the  errors  in  the  computed  values  of  e{i)  and 


where 


and 


e(i)  =  'true  e(0"(l  +  «,(o)     and    *(0  =  'true  *(0"-(l  +  €,(o) 


«<(0    =      -   «oWcj,-COS^e2,   -   2€/,-COs2ei,-COS^e2,  +   «/(  +  i-CO$^ei.,  +  i    +   20€ 


«j(0  =  «/r«'*^*i''«>s2e2,  +  €ou„,-cos^e2,  +  -^eu  +  — sin2e2,€i5 


2 


€,(,-)  and  €j(,-)  may  be  further  bounded  by: 

l«.(ol  s  l«oW«,l  +  2|€/J  +   [eyi^il  +  20£ 


and 


|e.(ol  =s   l«/,l  +   l^oucsil  +  25£sin='e2,  /4  +  15e/2     . 


(8.2) 

(8.3) 

(8.4) 
(8.5) 


Proof:  We  apply  (3.5)  and  Lemma  5  systematically  to  the  expressions  in  the  algorithm.  As 
before,  «s  with  numeric  subscripts  denote  independent  expressions  bounded  in  magnitude  by 
c. 

Note  that  at  the  top  of  the  loop,  g  is  known  exactly.  Therefore,  after  the  first  call  to  ROT  we 
have 

cs  =  cosei,-(l  +  e/;-sin^ei,  +  — -«i) 

4 

sn  =  sine,,-(l  -  e/,-cos^ei,  +  —€2) 

1  ^ 
r  =  'true  r"(l  +  e/j-cos^Oi,  +  — «3) 


The  errors  in  f,  g  and  h  are  given  by 
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/  =  {l+u)-oldcs-r  =  'true  f'(l  +  Cow^.  +  c^.-cos^ei,  +  -^£5) 


25 


g  =  (l  +  e6)j(«  +  l)Mn  =  'true  g'(l  -  c.-cos^e,,  +  -^e,) 

4 

A  =  il  +  ii)-s{i  +  l)cs  =  'true  *"•(!  +  c/j-sin^e,,  +  — e,) 

4 


After  the  second  call  to  ROT  we  have 
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21  21 

cs  =  cose2,-(l  +  «<,w<:j,-"d^®2.  +  2«/,-cos^ei,sin^e2i  +  — tio-sin^e2,  +  -^«n) 

21  21 

sn  =  sme2,(l  -  eoWcj/-cos^e2,  -  2€/,-cos^ei,cos^e2,  +  -r-«i2-cos^e2,  +  — «u) 

1  ^  2S 

r  =  'true  r"-(l  +  €/,-cos^ei,-cos2e2,  +  €oUcji-«'*^®2i  +  "2"*'*  ■*"  "^'"'"""^^Z') 

Since  oUcj  =  c«  and  /  =  /i  at  the  bottom  of  the  loop,  we  have  shown  that  at  the  start  of  the 
next  loop 

25 


/  =  /,+,(!  +  €/,-8in2e„  +  — «9) 

oldcs  =  oUcs, +  1(1  +  €<,u„/-8in^62/  +  2€^-cos2ei;-sin^e2,  +  ^tio-tm^^n  +  -^'ii) 


21  •  2»      .    21 


as  desired.  The  expressions  for  the  errors  in  e{i)  and  *(i)  follow  from  plugging  the  above 
bounds  into  the  expressions  e  (i  - 1)  =  oldsn*r  and  s  (1)  =  r.   D 

From  (8.4)  and  (8.5)  we  sec  that  the  errors  in  the  computed  «(()  and  s{i)  are  governed 
by   the  errors  «^.   and  f.oidcsi<  »Dd   that  the  growths  of  these  errors  are  governed  by  the 
recurrence  (8.1).    A  simple  but  somewhat  pessimistic  bound  on  these  errors  is  given  by 
Lemma  7:  Let  t/,,  ioidcsi>  «f(0  and  €,(,■)  be  as  in  Lemma  6.  Then 

|e/,|s25(i-l)e/4 

UoMcl  =s  113(.-l)t/4 

|€,(ol  s  (138i-53)e/4 

|«,(ol  s  (113i-58)t/4 


Proof:  Replace  the  recurrence  (8.1)  by 


£,  +  1  —  Ai'Ei  +  Fi 


where 


A,  = 


sin^ej, 


1.  0 

2cos^ei,-sin^e2i   sin^e2, 


F,  = 


25e/4 
2l€/4  +  2l€-sin^e2,  /2 


and  the  entries  of  E,  are  upper  bounds  for  |e/J  and  |€ojrf„,|-  Taking  £i  =  0,  this  recurrence 
has  the  solution 


^.  +  1  =   2  (    n     -^t)^; 


Trivial  bounds  for  A,  and  F,  arc 
|A,I  s 


sin^Oi,     0 
2cos^ei,    1 


and     \F,  |  s 


25t/4 
63«/4 


(8.6) 


(8.7) 


A  simple  induction  shows  that 


n 

*  "J + 1 


sin^e,*     0 
2cos2en   1 


n   sin^Ou         0 
k-j*\ 

2(1  -     ri    sin^ei*)   1 

*-;  +  ! 


and  the  rest  of  the  proof  is  a  straightforward  computation,  n 
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In  other  words,  the  relative  errors  in  the  computed  e{i)  and  s(i)  are  bounded  by  a 
linear  function  of  i,  and  so  the  largest  relative  error  is  bounded  by  a  linear  function  of  the 
matrix  dimension  n.  We  can  now  apply  Theorem  2  of  section  2  to  bound  the  errors  in  the 
singular  values  of  the  transformed  matrix: 

Theorem  5:  Let  fi  be  an  n  by  n  bidiagonal  matrix  and  B'  the  matrix  obtained  by  running  the 
implicit  zero-shift  QR  algorithm  on  B.  Let  the  singular  values  of  0  be  aji  •  •  •  ^<t„,  and 
the  singular  values  of  B'  be  aj'et  •  •  •  sa,'.  Then  if 

u  -  69n2e<l   ,  (8.8) 

the  relative  differences  between  the  singular  values  of  B  and  the  singular  values  of  B'  are 
bounded  as  follows: 

k,-cr,'| 


1-w 


Let  B).  be  the  matrix  obtained  after  k  repetitions  of  the  implicit  zero-shift  QR  algorithm,  and 
let  a^i^  ■  ■  ■  ^oia,  be  its  singular  values.    Then  if  condition  (8.8)  holds  we  have 

l*^'""^*-!  ^  (7; u    "  ^)  ■  *^'  ~  69tn^e-a,    , 

(l-<o)* 

where  the  approximation  to  the  last  upper  bound  holds  if  ibxCl- 

Proof:  Plug  the  bounds  of  Lemma  7  into  Theorem  2.  D 

Actually,  Lemma  7  and  Theorem  S  are  quite  pessimistic,  since  the  upper  bounds  in 
(8.7)  are  unattainable.  In  fact,  as  we  approach  convergence,  we  expect  the  rotation  angles  6], 
and  62,  to  approach  zero,  which  means  the  matrix  A,  should  approach  zero.  We  can  use  this 
fact  to  obtain  a  much  stronger  error  bound  in  the  region  of  convergence: 

Lemma  8:  Let  €//,  f.oidcst<  <f(o  *°<^  *j(0  ^  ^^  i°  Lemma  6.  Assume  further  that  all  the  rota- 
tion angles  6  during  the  course  of  the  algorithm  satisfy  sin^O  £  t  <  1.  Then 

25e 


I' 


4(1-t) 

_       SOre  21t€  21« 


l«.(0  I  =s 
Proof:  Use  the  bounds 


-  -1- 

21t€ 

2(1-t) 

-(- 

21t€ 

2(1-t) 

-1- 

21TC          , 

"""''    4(1-t)^    2(1-t)    4(1-t) 
I    I  ^   SOtc    ,   21t€   ,  24e  ,  »f,^ 


50t€    ,   2lTe   ,    23e    ,  25t€  ^  15« 


4(1-t)2         2(1-t)         2(1-t)  4  2 


T     0 

2t   t 


and     \Fi  \ 


25C/4 
21«/4  +  21eT/2 


The  rest  of  the  proof  is  a  straightforward  computation  from  (8.6).  □ 

These  bounds  permit  us  state  the  following  improvement  to  Theorem  5: 

Theorem  6:  Let  B  be  an  n  by  n  bidiagonal  matrix  and  B'  the  matrix  obtained  by  running  the 
implicit  zero-shift  QR  algorithm  on  B.  Assume  that  all  the  rotation  angles  6  during  the  course 
of  the  algorithm  satisfy  sin^e  s  t  <  1.  Let  the  singular  values  of  fi  be  aj^  •  •  •  aa„,  and 
the  singular  values  of  B '  be  a  j '  ^  ■  •  •  a  a„ ' .  Then  if 

88n€      ^  , 

«"7Tr;^<i.  (8.9) 

the  relative  differences  between  the  singular  values  of  B  and  the  singular  values  of  B '  are 
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bounded  as  follows: 


I  » I  ^      •" 


Let  Bk  be  the  matrix  obtained  after  k  repetitions  of  the  implicit  zero-shift  QR  algorithm, 
where  we  assume  all  rotation  angles  6  satisfy  sin^B  s  t  <  1.  Let  a/n^  •  •  •  ^at,  be  the 
singular  values  of  B*.   Then  if  condition  (8.9)  holds  we  have 

,       ,       1  ,x  _    88tne 

where  the  approximation  to  the  last  upper  bound  holds  if  t<i)<:l. 
Proof:  Plug  the  bounds  of  Lemma  8  into  Theorem  2.  □ 

Thus,  if  the  rotation  angles  are  all  bounded  away  from  ir/2,  the  error  after  k  iterations 
of  the  implicit  zero-shift  QR  algorithm  can  grow  essentially  only  as  the  product  kn.  The  algo- 
rithm can  easily  compute  t  as  it  proceeds,  and  so  compute  its  own  error  bound  if  desired.  In 
the  numerical  experiments  in  section  7,  we  observed  no  error  growth  at  all,  and  so  as  is  often 
the  case  an  algorithm  behaves  much  better  in  practice  than  rigorous  error  bounds  can  guaran- 
tee. 

Now  we  present  a  backward  error  analysis  of  the  Cholesky  decomposition  of  a  positive 
definite  symmetric  tridiagonal  matrix  A.  Our  goal  is  to  show  that  the  computed  Cholesky  fac- 
tor Z,  is  a  small  componentwise  relative  perturbation  of  the  exact  Cholesky  factor  of  a  small 
componentwise  relative  perturbation  of  A.  This  result  is  needed  in  section  5.  We  assume  that 
the  floating  point  square  root  function  sqri  satisfies  sqrt(a)  =  (H-*)a"^  where  \e  |s€. 

Theorem  8:  Let  /I  be  an  n  by  n  positive  definite  symmetric  tridiagonal  matrix  with  diagonal 
entries  oi,  .  .  .  ,a„  and  off  diagonal  entries  bi,  .  .  .  ,fc„-i.  Let  L  be  the  computed  Cholesky 
factor  from  the  following  algorithm: 

'ii  =  •»?'■'(<»  i) 
for  /■  =  1  to  n  - 1 

endfor 

Then  barring__  over/underflow  and  ^attempts  to  take  square  roots  of  negative  arguments, 
L  =  L  +  hL,  \hL  I  s  5.5c  |Z,  |,  where  L  is  the  exact  Cholesky  factor  of  the  matrix  A  +hA=LL  , 
where  |8i4  |  £  3e  |A  |,  and  the  diagonal  of  hA  is  zero. 

Proof:  We  construct  hA  and  L  as  follows.  Subscripted  ts  denote  independent  quantities 
bounded  in  norm  by  c.  Then 

In  =  fl{.sqrt{a,))  =   (l  +  CiOal'^ 
//4i,,  =  flib./l,.)  =  (H-€,,)fc,//„ 
'i  +  l.i  +  I  =  /?(*9'''(a.  +  l -'?+!..)) 

=    (l+€,4)((l+e,3)(a,  +  l-(l  +  «.2)'^l.,))"' 
=    (l+1.5-£,5)-(fl,  +  ,-(l-t-t,2)(l  +  €n)'i'? //?,)'" 

Let/,,  =  /i,/(l-(-1.5t,_,,5),  with  €0,5  =  2eii/3.  Then 
/„  =  /„/(l+e„)  =  a\'^ 
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/V,,  +  l   =    (fl.M   -    (l+€,2)(l  +  *„W/('^(l+l-5€.-1.5)'))"' 

A  A  A  A 

Let  /,  +  !,(  =  bi/lu-  Then  L  is  the  exact  Cholesky  factor  of  A  +hA  where  the  diagonal  of  bA  is 
zero  and  |M,.(  +  i|  =   ISe^s*,  |  s  3€|i>,|.   Therefore  |6A  |  s  SeJA  |  as  desired. 

Note  that  |/u-/„|  s  1.5e|/„|.   Since 

;  fe.         (l  +  3c,6)(l  +  l.St,-i.5)»,         (l  +  3c,6)(l-H.St,-i,s)    . 

'i  +  i.(  -  •* ; r— li  +  i,i  ■  (l  +  5.5e/7)/,  +  ,, 

we  have  |Z.  -Z.  |  s  5.5€  |Z,  |  as  desired.  D 
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9.  Implementation  Details 

In  this  section  we  discuss  a  number  of  details  of  the  implementation  of  the  code: 

Chasing  the  bulge  up  or  down 

SVD  of  2  by  2  triangular  matrices  and  robust  shift  calculation 

Deflation  when  s  (i)  =  0 

Over/underflow 

Chasing  the  bulge  up  or  down.  A  bidiagonal  matrix  may  be  graded  in  many  ways,  but  most 
commonly  it  will  be  large  at  one  end  and  small  at  the  other.  The  QR  algorithm  (either  stan- 
dard or  implicit  zero-shift)  tries  to  converge  the  singular  values  in  order  from  smallest  to 
largest.  If  the  matrix  is  graded  from  large  at  the  upper  left  to  small  at  the  lower  left,  and  the 
"bulge"  is  chased  from  upper  left  to  lower  right  as  in  section  3,  then  convergence  will  be  fast 
because  the  singular  values  are  "ordered"  correctly,  i.e.  the  diagonal  matrix  entries  are  fairly 
close  to  their  final  values.  If,  however,  the  matrix  is  graded  the  opposite  way  (from  small  at 
the  left  to  large  at  the  right)  then  the  algorithm  will  have  to  invert  the  order  of  the  matrix 
entries  as  it  converges.  This  may  require  many  more  QR  steps.  To  avoid  this,  the  implemen- 
tation tests  for  the  direction  of  grading  (simply  comparing  |ji1  and  \s„\),  and  chases  the 
bulge  in  the  direction  from  large  to  small.  If  a  matrix  breaks  up  into  diagonal  blocks  which 
are  graded  in  different  ways,  the  bulge  is  chased  in  the  appropriate  direction  on  each  block. 
The  algorithm  in  [Van  Huffel,  et  al]  does  this  as  well. 

This  means  the  singular  values  may  be  quite  disordered  in  the  final  converged  matrix, 
and  so  must  be  sorted  at  the  end  (along  with  the  singular  vectors  if  desired).  The  LINPACK 
SVD  uses  bubble  sort  at  the  end,  which  could  require  O^n"^)  swaps  of  singular  vectors,  which 
could  be  expensive.  The  new  algorithm  uses  insertion  sort  instead,  which  does  at  most  2n 
vector  moves.  At  the  cost  of  using  a  floating  point  array  to  store  integer  pointers,  this  may 
be  decreased  to  the  minimum  of  n  vector  moves. 

SVD  of  2  by  2  triangular  matrices  and  robust  shift  calculation.  The  need  for  the  singular  value 
decomposition  of  2  by  2  triangular  matrices,  or  at  least  the  smallest  singular  value  of  such  a 
matrix,  arises  in  two  places  in  the  code.  The  first  time  is  when  calculating  the  shift.  As  stated 
in  section  3,  the  standard  choice  of  shift,  called  Wilkinson's  shift,  is  the  smallest  eigenvalue 
of  the  bottom  2  by  2  block  of  BB^ .  It  is  easy  to  see  that  this  is  the  square  of  the  smallest 
singular  value  of  the  bottom  2  by  2  block  of  B.  The  second  need  for  the  SVD  of  a  2  by  2  tri- 
angular matrix  arises  when  the  code  has  isolated  a  2  by  2  block  on  the  diagonal  of  B.  Even 
though  this  appears  to  be  an  easy  case  for  the  algorithm  in  section  4,  it  turns  out  that  round- 
off can  prevent  convergence  when  the  singular  values  are  close.  This  is  the  case  in 


B  = 


a  b 

0   c 


when  \a  \  and  \c  \  are  close  and  b  is  much  smaller,  just  larger  than  €•  |a  |.  It  turns  out  that 
through  an  accident  of  roundoff,  b  is  often  no  smaller  after  one  step  of  QR  than  before,  so 
that  the  algorithm  never  converges.  It  is  also  difficult  in  this  situation  to  compute  the  singular 
vectors  accurately,  just  as  eigenvectors  corresponding  to  multiple  eigenvalues  are  difficult  to 
compute. 

To  get  around  these  difficulties,  subroutine  SIG22  takes  the  entries  a,  b  and  c  of  B  and 
returns  the  two  singular  values  as  well  as  the  left  and  right  singular  vectors.  Barring  over- 
flow and  underflow,  the  returned  values  are  accurate  to  nearly  full  machine  precision,  even 
for  nearly  coincident  singular  values.  This  property  is  based  on  the  fact  that  SIGH  uses  for- 
mulas for  the  answer  which  contain  only 

products,  quotients,  and  squsire  roots. 
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sums  of  terms  of  like  sign, 

differences  of  computed  quantities  only  when  cancellation  is  impossible,  and 
the  difference  \a  \  -   \c\,  which,  if  cancellation  occurs,  is  exact. 
It  is  straightforward  to  use  these  properties  to  show  that  the  final  result  is  correct  to  nearly 
full  precision. 

The  code  is  also  robust  in  the  face  of  over/underflow.  Overflow  is  avoided  where  possi- 
ble by  using  formulas  in  terms  of  ratios  of  matrix  entries,  and  choosing  the  formulas  so  that 
the  ratios  are  always  bounded  by  1  in  magnitude.  As  a  result  of  these  precautions,  overflow 
is  impossible  unless  the  exact  largest  singular  value  itself  overflows  (or  is  within  a  few  units 
in  the  last  place  of  overflowing).  Underflow  (of  the  conventional  "store  zero"  variety)  can 
damage  the  results  only  if  the  data  and/or  results  are  themselves  close  to  the  underflow  thres- 
hold, specifically  less  tiian  the  underflow  threshold  divided  by  e.  Gradual  underflow  [Dem- 
mcl]  makes  the  calculation  of  the  singular  values  impervious  to  underflow  (unless  the  final 
results  themselves  underflow)  and  the  singular  vectors  much  less  susceptible  to  underflow 
problems. 

Deflation  when  j(i)  =  0.  The  standard  SVD  algorithm  [LINPACK]  has  special  code  to  handle 
the  case  when  j(0  =  0.  This  code  does  a  simplified  sequence  of  rotations  (similar  to  implicit 
zero-shift  QR)  to  introduce  a  zero  on  the  superdiagonal  of  the  bidiagonal  matrix  and  so 
break  it  into  two  simpler  problems.  It  is  easy  to  see  that  the  implicit  zero-shift  QR  algorithm 
does  this  deflation  automatically,  yielding  one  zero  on  the  superdiagonal  for  each  zero  on  the 
diagonal.  This  occurs  after  one  pass  of  the  algorithm.  Furthermore,  after  one  pass  both  s{n) 
and  e  (n  - 1)  will  be  zero,  meaning  that  the  zero  singular  value  has  been  deflated  exactly. 

We  can  see  this  as  follows.  Whenever  s(i  +  \)  =  0,  both  g  and  h  will  be  set  to  0,  causing 
the  sn  returned  by  the  second  call  to  ROT  to  be  0.  At  the  end  of  the  loop,  both  f  =  h  and 
oldsn  =sn  will  also  be  zero.  In  fact,  it  is  easy  to  see  that  from  now  on  both  h  and  the  /  at  the 
bottom  of  the  loop  will  be  zero:  at  the  top  of  the  next  loop  iteration,  the  zero  value  of  / 
causes  the  first  call  of  ROT  to  compute  cs  =  0;  this  causes  h  =  s{i  +  l)'cs  to  be  zero  and  the 
pattern  repeats.  Also,  when  oldsn  =  0  (which  happens  when  j(j-(-1)  =  0),  e(i  — 1)  is  set  to  zero 
on  the  next  iteration,  i.e.  s(i +  1)=0  implies  e(i)  becomes  zero.  Finally,  at  the  end  of  all  the 
loop  iterations,  h  is  still  zero  implying  both  e(n-l)  and  s(n)  are  set  to  zero. 

When  /  is  zero,  as  it  frequently  is  in  this  case,  the  first  call  to  ROT  need  only  set  cs  =  0, 
*n  =  1  and  r=g;  this  is  what  the  first  "if"  branch  in  ROT  does. 

Over/underflow.  Most  of  the  error  analyses  presented  here  can  be  extended  to  take 
over/underflow  into  account.  Techniques  for  error  analysis  in  the  presence  of  underflow  are 
discussed  in  [Dcmmel].  If  over/underflow  is  handled  as  suggested  in  the  IEEE  Floating  Point 
Standard  [IEEE],  then  using  Sylvester's  theorem  to  count  the  number  of  eigenvalues  less  than 
X  (2.1)  can  be  made  completely  impervious  to  over/underflow  [Kahan]:  If  some  d,=  ±0,  then 
rf,  +  i  =  ±oo  and  d,+2  =  fl,+2,  and  we  count  the  number  of  d,  whose  sign  bit  is  negative  (i.e. 
including  -0  and  -»).  Rules  for  arithmetic  with  ±0  and  ±<»  are  described  in  detail  in 
[IEEE]. 
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10.  The  Accuracy  of  the  Compated  Singular  Vectors 

In  this  section  we  assess  the  accuracy  of  the  computed  singular  vectors.  Just  as  with  the 
standard  SVD,  the  new  algorithm  guarantees  a  small  residual  in  the  sense  that  both  ||Bv  — au|l 
and  ||B'^u-av||  are  on  the  order  of  e||B||,  where  ct  is  the  computed  singular  value  and  u  and  v 
arc  the  computed  singular  vectors.  However,  in  contrast  to  the  singular  values,  high  relative 
accuracy  in  the  bidiagonal  matrix  entries  does  not  guarantee  high  relative  accuracy  in  the 
singular  vectors;  we  will  give  a  2  by  2  example  to  iUustrate  this.  It  also  turns  out  to  be  impos- 
sible to  guarantee  a  tiny  componentwise  relative  backwards  error,  where  each  computed 
singular  vector  of  B  would  be  the  exact  singular  vector  of  a  small  componentwise  relative 
perturbation  B  +IB  of  B,  with  |8B  |  s  ii  |B  |,  tj  on  the  order  of  machine  precision.  We  will 
also  demonstrate  this  with  a  small  example. 

In  place  of  such  simple  a  priori  forward  or  backward  error  bounds,  our  bounds  will 
depend  on  the  singular  value  distribution.  Briefly,  the  closer  together  singular  values  are,  the 
less  accurately  their  corresponding  singular  vectors  can  be  computed.  This  dependency  is  cap- 
tured in  the  well  known  "gap"  theorem  [Parlett,  p.  222]  which  can  be  used  to  show  that  the 
angular  error  in  the  computed  singular  vectors  corresponding  to  ct,  is  bounded  by  the  largest 
roundoff  error  committed  divided  by  the  "gap"  or  difference  between  a,  and  its  nearest 
neighbor  <t,±].  This  well  known  bound  holds  for  the  standard  SVD  applied  to  dense  matrices 
as  well  as  the  new  algorithm. 

Although  we  have  not  proven  anything  rigorously,  numerical  experience  leads  us  to 
make  the  following  conjecture  for  the  new  algorithm  applied  to  bidiagonal  matrices  which 
would  significantly  strengthen  the  bound  in  the  last  paragraph:  the  "gap"  min|a,  — a,£i  |  in  the 
denominator  of  the  above  error  boimd  can  be  replaced  by  the  "relative  gap" 
mln(|CTi-(T,*|  |/o-,).  Since  the  relative  gap  can  be  much  larger  than  the  gap,  the  resulting 
error  bound  can  be  much  smaller.  For  example,  if  B  is  3  by  3  bidiagonal  matrix  with  singular 
values  o'i  =  l,  ct2  =  2-10~^°  and  (73=  10"^,  the  old  error  bounds  for  the  vectors  corresponding 
to  the  two  tiny  singular  values  arc  on  the  order  of  10^°c  since  the  gap  is  10~^°.  However,  the 
conjectured  bounds  are  both  e  since  the  relative  gap  between  2-10"^  and  10"^  is  1.  Proving 
this  conjecture  rigorously  remains  an  open  problem. 

Now  we  present  a  two  by  two  example  showing  that  small  relative  perturbations  in  the 
entries  of  a  bidiagonal  matrix  can  cause  large  perturbations  in  the  singular  vectors: 


AM  = 


0       1 


As  t]  varies  from  0  to  e,  an  easy  computation  shows  that  both  left  and  right  singular  vectors 
rotate  by  22. S  degrees. 

The  same  example  can  be  used  to  show  that  no  tiny  componentwise  relative  backward 
error  bound  can  hold.  Specifically,  let  u,  and  v,  be  the  left  and  right  unit  singular  vectors  of 

1  +  e   < 
0      1 

computed  by  the  new  algorithm  (for  ease  of  presentation  we  ignore  the  fact  that  2  by  2 
matrices  are  handled  specially  by  the  algorithm;  this  same  phenomenon  holds  for  larger 
examples  as  well).  Suppose  that  u,  and  v,  differ  by  at  most  C|  from  the  exact  unit  singular 
vectors  u,  and  v,  of  a  componentwise  relative  perturbation  A  +hA  of  A,  where  |M  |  s  €2  |j4  |. 
Then  if  €2  =  o(€^'^),  ei«2  =  tl{t).  In  other  words,  cj  is  fl(e"').  Therefore,  any  attempt  to 
prove  a  small  componentwise  relative  backward  error  o(t^'^)  must  permit  errors  in  the  com- 
puted vectors  at  least  as  large  as  €"■'»£. 

The  proof  goes  as  follows.  Applying  the  new  algorithm  to  A  (and  ignoring  the  fact  that 
2  by  2  matrices  are  handled  specially),  we  set  A12  to  0  and  get  the  columns  of  the  identity 
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matrix  as  left  and  right  singular  vectors.  Now  we  make  relative  perturbations  of  size  at  most 
e2  =  €'*  (a>2/3)  in  each  entry  of  i4  (here,  |«2il^«2)  = 


B  =  A  +  bA  = 


(l  +  e)(l+e2,)    €(l+€22) 
0  1  +  C23 


Compute 


B^B  =  I  + 


2€  +  2e2i       c 
«  2*23 


/  +  t 


X     1 

1  y 


+  O^e^  +  el) 


-  /  +  eC  +  0{f.^  +  tl) 

where    |j:  |  s  2+2«2/€   and    ly  |  s  2*2 /«•    We  consider  the  eigenproblem  for  C.   Suppose 
[1  r\]^  is  an  eigenvector  of  C;  we  will  show 

hi  s  3  +  4€2/e 


3  +  4€2/e 


implying  the  angle  between  an  eigenvector  of  C  and  [1  0]^  is  n(e/€2)  =  n(e'   ").  We  com- 
pute as  follows.    If  [1  -n]^  is  an  eigenvector  of  C,  t\  must  satisfy  t)^  + (x -y)T)- 1  =  0  or 


iLZll  ±  (((y-^) 


-?  +  1) 


1/2 


2  ''2 

Since  \(y  -x)/2\  s  l-t-2«2/«.  it  is  easy  to  see  both  \f\  \  and  |-n~'  |  are  bounded  by 

l  +  2€2/t  +  ((l+2€2/e)^+l)"^  s  3  +  4e2/€ 

as  desired. 

Now  consider  the  so  far  ignored  perturbation  0(e^  +  e|).  The  gap  between  the  eigen- 
values of  C  is  computed  to  be  ((j->)^  +  4)"2  a  2.  Thus  the  perturbation  0(€^  +  €^)  can 
change  the  eigenvectors  by  at  most  0(e-l-€^/€).  When  €2  =  *'",  this  is  a  perturbation  of  at 
most  0(€^°"').  But  when  a>2/3,  2a-l>l-a  and  so  the  perturbation  cannot  change  the 
lower  bound  n(€'~'")  on  \t\\. 

Thus,  a  relative  perturbation  of  size  e°  (a>2/3)  to  A  means  the  right  singular  vectors 
are  least  fl(€'"")  =  n(ei)  away  from  the  computed  right  singular  vectors.  Thus  €i-«2  =  n(e) 
as  desired. 

Since  our  algorithm  handles  2  by  2  matrices  as  special  cases,  a  4  by  4  matrix  like 


1    1    0 

0 

0     1     € 

0 

0   0    1 

1 

0   0  0 

1 

could  be  used  in  the  proof,  but  the  computations  are  more  complicated. 

As  stated  above,  rigorous  error  bounds  on  the  computed  singular  vectors  depend  on  the 
singular  value  distribution,  and  that  the  closer  together  singular  values  are,  the  less  accurately 
their  corresponding  singular  vectors  can  be  computed.  The  "gap"  theorem  [Parlett,  p.  222] 
expresses  this  dependency  by  bounding  the  angular  error  in  the  computed  singular  vectors  by 
the  residual  ||Bv,-o,«,  ,  sju,- a, i,\\  (the  norm  of  the  n  by  2  matrix  [Bv,-a,u,  ,  B''iJ,-a,v,]) 
divided  by  the  gap  (here  u,,  v,  and  a,  are  the  computed  singular  vectors  and  singular  value). 
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Standard  backward  error  analysis  shows  that  the  residual  may  be  bounded  by  largest  round- 
off error  committed  (which  is /(n)€||B||,  /(n)  a  modest  function  of  n  and  e  the  machine  pre- 
cision).  This  yields  the  error  bound 

max(e(u,,M,)  ,  e(0„v,))  s  /(«)€  1|B||  /  gap  -  /(«)€  ||B||  /  min  |(T,-a,:til    ;      (10.1) 
here  u,  and  v;  are  the  exact  singular  vectors. 

The  bound  (10.1)  is  true  for  the  standard  SVD  of  a  dense  matrix  as  well  as  the  new 
algorithm.  A  natural  question  is  whether  the  bound  can  be  improved  for  the  new  algorithm 
applied  to  the  bidiagonal  singular  value  problem.  Although  we  have  no  rigorous  results, 
numerical  experience  supports  the  following 

Conjectnre:  Let  fi  be  an  unreduced  bidiagonal  matrix  with  singular  values  a,  and  left  and 
right  singular  vectors  u,  and  v,.  Let  the  singular  vectors  computed  by  the  new  algorithm  be  w^ 
and  V,.  Then  the  errors  in  it,  and  v,  are  bounded  by 

max(e(u,,u,)  ,  e(v„v,))  s  /(n)€  /  relative  gap  ■  /(n)«  /  min  (,\ai-at±i\lai)    .  (10.2) 

The  justification  for  this  conjecture  is  as  follows.  In  section  8  we  proved  that  the  zero- 
shift  part  of  the  algorithm  is  forward  stable  across  a  single  QR  sweep;  numerical  experience 
indicates  that  it  is  actually  forward  stable  across  many  QR  sweeps.  This  forward  stability 
means  the  accumulated  transformation  matrices  are  computed  accurately.  Thus,  the  only  seri- 
ous errors  are  committed  on  convergence:  setting  an  offdiagonal  to  zero.  If  we  use  a  "conser- 
vative" convergence  criterion,  where  only  offdiagonals  smaller  than  €cr„i„  are  set  to  zero, 
the  numerator  in  (10.1)  is  reduced  from  /(n)€||B||  to  /(n)€an,j„,  which  implies  the  conjec- 
ture. Extending  this  argument  to  the  stopping  criterion  described  in  section  4  appears  diffi- 
cult, and  it  is  possible  that  with  the  more  conservative  stopping  criterion  the  algorithm  will 
occasionally  compute  more  accurate  vectors  than  the  criterion  of  section  4.  Therefore  a  user 
interested  in  accurate  singular  vectors  might  have  to  pay  a  performance  penalty  of  30%  - 
40%  in  order  to  use  the  more  conservative  stopping  criterion  instead  of  the  faster  one. 

Unfortunately  it  is  impossible  to  use  the  gap  theorem  straightforwardly  to  estimate  the 
error  in  the  computed  singular  vectors  because  the  residual  cannot  be  computed  accurately:  if 
u,  V  and  a  are  the  correctly  rounded  version  of  the  exact  answers,  the  error  in  rounding  alone 
could  well  make  the  residual  \\Av-au  ,  A'^u-av\\  as  large  as  €||A||,  independent  of  a.  This 
would  be  the  case  if  u  and  v  had  large  components  corresponding  to  large  components  of  A, 
so  that  the  residual  could  be  small  only  through  massive  cancellation.  On  the  other  hand,  if 
A,  u  and  v  are  graded  in  such  a  way  that  individual  terms  oav,  and  a,,  +  iVi  +  i  in  the  product 
Av  are  small  (similarly  for  a,  +  i.iu,  and  a„tt,  in  A^u)  then  the  computed  residual  may  well  be 
small  compared  to  ct;  in  this  case  the  computed  singular  vectors  will  be  quite  accurate.  The 
residual  in  the  gap  theorem  can  however  be  accumulated  from  the  offdiagonals  set  to  zero 
during  the  algorithm,  thus  avoiding  the  problem  of  inaccurate  residuals  entirely:  Note  that 
only  a  subset  of  the  annihilated  offdiagonals  contribute  to  the  residual  for  any  singular  vector 
(once  a  block  is  deflated  and  the  desired  singular  value  belongs  to  the  other  block,  aimihila- 
tions  in  the  deflated  block  don't  count  any  more).  Thus  it  seems  for  a  modest  cost  we  can 
compute  error  bounds  for  one  or  more  singular  vectors;  the  cost  is  keeping  track  of  which 
offdiagonals  were  annihilated  in  which  order. 
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11.  CoDclasions  and  Open  Problems 

We  have  described  a  method  for  computing  all  the  singular  values  of  a  bidiagonal 
pf  ^  c°^r^^",  ^  ^""  machine  precision,  and  showed  it  to  be  comparable  in  speed  to  the  LIN- 
rACK  SVD  algorithm.  This  computation  is  justified  because  small  relative  errors  in  the  bidi- 
agonal entries  (from  roundoff  in  the  algorithm  or  from  previous  computations)  can  only 
cause  small  relative  errors  in  the  singular  values,  independent  of  their  magnitudes.  The  tech- 
nique can  be  extended  to  computing  the  eigenvalues  of  certain  symmetric  tridiagonal  matrices 
with  high  relative  accuracy  as  well. 

A  number  of  open  questions  remain.  First,  how  accurate  are  the  singular  vectors  com- 
puted by  this  algorithm?  We  cannot  generaUy  expect  high  relative  accuracy  in  all  singular 
vectors,  because  clustered  singular  values  can  have  arbitrarily  Ul-conditioned  singular  vectors 
StiU.  It  appears  that  it  might  be  possible  to  compute  accurate  singular  vectors  as  long  as  the 
relative  difference  between  the  corresponding  singular  value  and  its  nearest  neighbor  b  large 
at  least  if  we  use  a  conservative  stopping  criterion  which  is  30%  to  40%  slower  than  the  one 
m  section  4.  When  in  practice  is  it  necessary  to  compute  such  accurate  singular  vectors  for 
tiny  clustered  singular  values?  Do  applications  demand  accurate  singular  vectors,  or  are  tiny 
residuals  sufficient,  and  if  so,  how  tiny?  ' 

Second,  since  we  have  shown  that  it  is  possible  to  obtain  accurate  singular  values  from 
r^lT  "."''t^""^  "f;^^'  ^'  "^y  "•^  ^1>"  it  is  possible  to  guarantee  accuracy  in  the 
cf«««  nf  '''^.•'«°°^  ^°^'°-  T*"''*  ^  ^^^"ly  °°*  P°"ible  in  general,  but  for  some  special 
t^^L,     °t  ^'"l''   as   diagonally   dominant  positive   definite  symmetric  tridiagonal 

matrices)  we  have  seen  that  reduction  to  bidiagonal  form  is  accurate.  For  what  other  special 
classes  is  this  true  (perhaps  diagonally  dominant  ones)? 

Third,  how  generally  can  our  impUcit  zero-shift  technique  be  employed  to  guarantee 
^^dTroXYv      ■"  "^  -«"-'"«'  A^  -"tioned  in  section  3,  a  simiL  technique  wa 
used  in  root-free  versions  of  symmetric  tridiagonal  QR;  can  it  be  modified  to  produce  a  tridi- 
agonal symmetric  QR  algorithm  which  guarantees  accurate  eigenvalues  for  at  least^me 
interesting  classes  of  symmetric  tridiagonal  matrices? 

fir...r'"^^\'^^V'  "^^  ^T  ""'"''  algorithm  for  high  accuracy  singular  values?  As  men- 
boned  in  section  3    zero-shift  QR  lends  itself  very  well  to  a  systolic  array  implement^tfon 

tesl  li' r^  i'  '  'V'  °°;  f  "''  '°  "^  '°"  "»  incorporate  shifts 'and 'convergenc^ 
tesung.  In  section  6,  we  showed  that  Sylvester's  Law  of  Inertia  could  be  used  to  compute 
high  accuracy  singular  values  via  bisection  (and  its  refinements).  Such  a  technique  h«Ten 
successfuUy  paraUelized  for  finding  eigenvalues  of  symmetric 'tridiagonal  mam^s[Loet 
al.].  The  answer  wiU  probably  depend  on  whether  aU  or  just  some  singular  valuef 'are 
desired;  in  the  latter  case  bisection  will  likely  be  superior. 
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